Task-Based Evaluation of Skin Detection for Communication

and Perceptual Interfaces

abstract | dataset | method | results | download | people

Abstract

    Skin detection is frequently used as the first step for the tasks of face and gesture recognition in perceptual interfaces for human-computer interaction and communication. Thus, it is important for the researchers using skin detection to choose the optimal method for their specific task. In this paper, we propose a novel method of measuring the performance of skin detection for a task. We have created an evaluation framework for the task of hand detection and executed this assessment using a large dataset containing 17 million pixels from 225 images taken under various conditions. The parameter set of the skin detection has been trained extensively. Five colorspace transformations with and without the illuminance component coupled with two color modeling approaches have been evaluated. The results indicate that the best performance is achieved by transforming to SCT colorspace, using the illuminance component, and modeling the distribution with the histogram approach. Some conclusions such as the SCT colorspace being one of the best colorspaces are consistent with our previous work, while findings such as the YUV colorspace performing well in this work when it was one of the worst in our previous work are different. This indicates that the performance measured at the pixel-level might not be the ultimate indicator for the performance at the task-level of hand detection. We believe that the users of skin detection will find our task-based results to be more relevant than the traditional pixel-level results. We acknowledge that an evaluation is limited by its specific dataset and evaluation protocols.

Dataset

    225 images composing over 17 million pixels were collected using the Digiclops range camera. Images of people in hand poses indoor and outdoor and images with no people in scene are part of the dataset. Each image is categorized by scene type, illumination, hand pose, and skin tone. Below is a sample of the dataset. In the download section you can download the dataset.

Method

    We evaluate a hand detection method that uses a disparity image and color image as input. Skin detection is performed on the color image and small noises detected from the skin detector are removed. The connected skin component that is closest to the camera and within a certain size is considered the hand.

Steps of Hand detection.

 Histogram distribution and Normal Density distribution, and  5 different color spaces (CIELAB, HSI, RGB, SCT, YUV) with and without the luminance component are different variables for creating a statistical based skin detector in these experiments. Each of these 20 combinations is evaluated for the task of hand detection using ROC analysis. A detection is considered a true positive when the ground truth bounding box overlaps with the detected hand bounding box by 75 percent.

Results

Below is a table of the rankings of the 20 skin detectors and the ROC curves for the best and worst skin detectors. The number is the area under curve (AUC) with its ranking in parenthesis.

  Normal Histogram
colorspace 3D 2D 3D 2D
CIELAB 0.052 (20) 0.057 (19) 0.443(2) 0.384(6)
HSI 0.289 (14) 0.249(16) 0.439(3) 0.309(12)
RGB 0.354(10) 0.233 (18) 0.374 (7) 0.263 (15)
SCT 0.336 (11) 0.247 (17) 0.450 (1) 0.303 (13)
YUV 0.366 (8) 0.356 (9) 0.387 (5) 0.400 (4)

The best 3 skin detectors used histogram color modeling and the colorspaces SCT, CIELAB, and HSI with the luminance component. The four worst skin detectors used the normal density color modeler. These findings are similar to our pixel level evaluation with the surprise of YUV  doing relatively well in both modelings.

We also did analysis of the performance of certain types of images and found that the algorithm performed better with dark skinned persons outdoors and performed its worst with light skin persons indoors.

Download

People