Model-based Face Tracking


It's happened to all of us. You've wound up in a hole-in-the-wall restaurant you already regret going into, and you're worried about creepy-crawlies. Then there! On the floor! A really big scary bug... Oh wait, it's just a piece of fuzz.
Model-based computer vision can use this story for computational inspiration. (No, we're not going to see bugs everywhere!) A computer system can use what it's expecting to see, to help determine what it does see. In particular, the use of a model which describes objects in a scene will make scene understanding easier. Of course, using a model in the wrong situation can produce the wrong interpretation. (Ok, so we might see a few bugs.)

Now consider the problem of tracking the motion of an observed human face. We could use a general purpose object-tracker for this application (one which simply sees the face as a deforming surface), but this would be ignoring everything we know about faces, especially relating to their highly constrained appearance and motion.

Face models and model-based estimation

Instead, a model-based approach to this problem would use a model which describes the appearance, shape, and motion of faces to aid in estimation. This model has a number of parameters (basically, "knobs" of control), some of which describe the shape of the resulting face, and some describe its motion.

In the picture below, the default model (top, center) can be made to look like specific individuals by changing shape parameters (the 4 faces on the right). The model can also display facial motions (the 4 faces on the left showing eyebrow frowns, raises, a smile, and an open mouth) by changing motion parameters. And of course, we can simultaneously change both shape and motion parameters (bottom, center).


A model of face shape and motion

Now you might be asking yourself, how can this model be detailed enough to represent any person making any expression? The answer is: it can't. But, it will be able to represent any of these faces to an acceptable degree of accuracy. The benefit of this simplifying assumption is that we can have a fairly small set of parameters (about 100) which describe a face. This results in a more efficient, and more robust system.

Estimating parameters from images using a model lets us use new methods for processing (which are still related to their counterparts which do not use a model). For example, computing the motion of an object using optical flow (without a model) results in a field of arrows. However, when a model is used, a set of "parameter velocities" is extracted, instead.


Results

Below are some of our face tracking results (this work is joint with Dimitris Metaxas). In each experiment, the tracked subject moves their head and makes some facial expressions (currently, the system is not real-time). The initial position of the face in each sequence is performed by hand (by identifying a small set of feature points).

In each case, the estimated model (the black wireframe) is superimposed on the captured images. In each of the following sequences, the subjects make complex facial and head motions, which are successfully tracked by our framework.


[444K QuickTime movie]


[505K QuickTime movie]


[556K QuickTime movie]

(I would like to thank Yaser Yacoob and the Center for Automation Research at the University of Maryland at College Park for providing the second and third image sequences).

Validation
We have also performed validation studies where we mark a subject with dots (top), extract the positions of these markers (middle: marker locations are shown as white dots), and compare this with the model-predicted locations of these markers.

Our results show that our method can maintain accurate track to about one-half centimeter accuracy (in the image plane) for long sequences (many hundreds of frames) without drifting or losing track.


Results from validation study

Why track a face?

Well, being able to track a face from images contributes toward the ability to monitor a user's attention and reactions automatically and without intrusion, and has obvious benefits in human-machine interaction. This is the subject of interaction research underway at THE VILLAGE.


Publications

There are a number of available on-line publications describing this work (in chronological order):


Links



HOME