See also Grant Scott's Masters Report: The Application of Morphological Shared-Weight Neural Networks for Face Recognition" , Computer Engineering and Computer Science Dept., University of Missouri, Columbia, MO, Jan., 2003.
|
Current face recognition systems have limitations in performance under real world conditions. Often as target images vary from a straight-on capture of a face, performance declines. Many systems can be "fooled" by simple variations in environment lighting level, facial expressions, and various forms of facial occlusion (hats, facial hair, sun glasses, etc.). To overcome these limitations, we designed a Morphological Shared-Weight Neural Network structure capable of learning faces from gray-scale images. The MSNN is unaffected by shifts in light levels, and shows very high reliability during target object tilt, rotation, and occlusion. The MSNN is a heterogeneous network composed of two cascaded sub-networks, the feature extraction and classification neural networks. The feature extraction layer takes a two dimensional array as input, which is the input sub-image. This input is passed through kernels that can perform a linear or non-linear mapping, these kernels are the morphological structuring elements. Each sub-image input to the network is passed through both the hit and miss kernels. These structuring elements together compose the input weights of the next layer, a feature map. The combination of structuring kernels and feature maps perform the gray-scale hit-miss transform, which is the output result for the feature extraction phase of the MSNN. This output is the direct input to a classic feed-forward neural network. The feature extraction and classification networks are trained together, allowing the MSNN to simultaneously learn feature extraction and classification for a face. The MSNN is trained with a set of images that show the face in numerous orientations, 20-35 images. Therefore, recognizing an occluded face is strictly a property derived from the network architecture, not training. The classification FFN phase has a fuzzy output of the confidence that an input sub-image is the desired target face. To utilize this output, we test scan an entire image and create a Detection Image Plane (DIP- image black with gray and white pixels), with confidence values converted to gray-scale. A threshold is applied to this image, with the corresponding high values overlaid onto the input image. The result of this is an image with the target marked by white in the middle of the face (as seen below). Another output image is the BOX image; this is accomplished by converting the DIP to a binary image at the threshold point and applying some post processing on this generated image. The result is then used to construct a box the size of the scanning sub-image centered on the target.
|
|
|||
| Test of target face wearing sun glasses; Original Input Image, Detection Image Plane, Boxed Target Output | ||||
|
|
||||
| Various other results: maximum distance recognition, partial left side profile, looking down, head tilt, partial occlusion from a ballon, and wearing normal eye-glasses. | ||||
|
|
||||