Locating and interpreting faces in images and image sequences is a
difficult problem in machine vision, due to the inherent variability
between and within individuals. The appearance of a face in an image
varies with the identity of the individual, pose, lighting conditions,
and deformations due to expression or speech.
Previous work has shown how the problem can be addressed by using
statistical models which combine shape and intensity variation within a
single framework. These
Combined Appearance Models
[
4
], account for all sources of variability in face images. We are
interested in isolating the specific sources of variation present in
face images, in order to improve identity recognition in the presence of
pose, lighting and expression variation, and to allow more robust
tracking, by modelling the dynamics of different sources of variability
separately. We show how a discriminant analysis method [
4
] can be used to acheive this to a first-order approximation by assuming
the sources of variation are orthogonal and identical for different
individuals. This last assumption is necessary because it is
unrealistically restrictive to assume a sufficiently large training set
for every individual, to determine a class-specific model of
variability. We describe how, using image sequences, the first-order
approximation to the separation of sources of variability can be
improved with a class-specific correction, to give a class-specific
representation for particular individuals. This allows a more precise
description of identity, and better decoupling of the sources of
variation. The decoupling is used to provide separate dynamic models of
variation for sequences which can be used in a Kalman filtering
framework. We show an example of the method used to track a face in an
image sequence, acheiving robust tracking, and yielding a more precise
estimate of identity.
Gareth J Edwards