Objective Evaluation of Approaches of Skin Detection using ROC Analysis

abstract | dataset | results | download | people

Abstract

Skin detection is an important indicator of human presence and actions in many domains, including interaction, interfaces and security. It is commonly performed in three steps: transforming the pixel color to a non-RGB colorspace, dropping the illuminance component of skin color, and classifying by modeling the skin color distribution. In this paper, we evaluate the effect of these three steps on the skin detection performance. The importance of this study is a new comprehensive colorspace and color modeling testing methodology that would allow for making the best choices for skin detection. Combinations of nine colorspaces, the presence or the absence of the illuminance component, and the two color modeling approaches are compared for different settings (indoor or outdoor) and modeling parameters
(the histogram size). The performance is measured by using a receiver operating characteristic (ROC) curve on a large dataset of 845 images (consisting more than 18.6 million pixels) with manual ground truth. The results reveal that (1) colorspace transformations can improve performance in certain instances, (2) the absence of the illuminance component decreases performance, and (3) skin color modeling has a greater impact than colorspace transformation. We found that the best performance was obtained on indoor images by transforming the pixel color to the HSI or SCT colorspaces, keeping the illuminance component, and modeling the color with the histogram approach using a larger size distribution.

Dataset

A dataset of 845 images with 18.6 million pixels was used to compute the performance. The dataset was composed of 4.9 million pixels of skin pixels and 13.7 million pixels without skin pixels. The images with skin pixels were collected from the AR face dataset, the UOPB dataset, and University of Chile dataset. We collected images completely without skin randomly from the University of Washington content-based image retrieval database. Below is a sample of the images from each dataset.

AR dataset (indoor images with different level of lighting)

no extra extra on right extra on both sides

UOPB dataset (indoor images with different lighting materials)

incandescent light daylight horizon light fluorescent light

University of Washington (outdoor images of non-skin pixels)

  

University of Chile (Various outdoor people scenes from web and digitized movie clips)

Ground Truth

The ground truth (GT) is defined at the pixel-level. The three labels are used to label pixels as skin (black), non-skin (white), or don't-care (gray). 'Don't care' label is assigned to pixels that are too ambiguous or tedious to label as either skin or non-skin. Below is a sample of some of the ground truth.

Evaluation and Results

The data was divided into 10 train/test folds. For each fold 90 % of the data is used for training the classifiers and 10% is for testing the performance. We evaluate the performance of the classifier by counting the number of true positives and false positives for several different threshold parameters of the classifier. From the performance of each threshold, we construct a Receiver Operator Characteristic (ROC) curve and compute the area under the curve as the performance. We do this for each fold and each classifier/colorspace/with-illuminance component,/without-illumminance component combination. We compute the average testing AUC of the 10 folds for each combination. For histogram, we select the bin size with the highest average training performance..

Below is a table showing the area under the curve (AUC) for each combination. 1.0 AUC is perfect and 0.0 is the worst. Beside the AUC is the ranking of the 36 combinations

 

Normal

Histogram

  3D 2D 3D 2D
CIELAB 0.889582 (18) 0.899549 (12) 0.907608 (6) 0.894391 (16)
CIEXYZ 0.861596 (27) 0.848372 (35) 0.894707 (15) 0.876083 (20)
HSI 0.843751 (36) 0.85416 (31) 0.947461 (1) 0.939184 (2)
NRGB 0.874557 (22) 0.878295 (19) 0.89405 (17) 0.897082 (14)
RGB 0.862084 (26) 0.875932 (21) 0.89825 (13) 0.904549 (10)
SCT 0.91291 (4) 0.905784 (9) 0.931799 (3) 0.912142 (5)
YCbCr 0.862135 (24) 0.851044 (32) 0.906634 (7) 0.855645 (28)
YIQ 0.862089 (25) 0.850979 (34) 0.901814 (11) 0.855302 (30)
YUV 0.862159 (23) 0.851041(33) 0.906622 (8) 0.855598 (29)

Below is two ROC curves of the best performing combination in red (HSI 3D histogram), and the worst in green (HSI 3D Normal).

Downloads

People