Skip to content Skip to navigation
Pre-Defense
5/18/2018 02:00 pm
CBIM 22

Learning human facial performance: analysis and synthesis

Hai Pham, Dept. of Computer Science

Defense Committee: Prof. Vladimir Pavlovic (Chair), Prof. Dimitris Metaxas, Prof. Ahmed Elgammal, Prof. Jiebo Luo (Rochester University)

Abstract

Human faces convey a large range of semantic meaning through facial expressions, which reflect both actions and affective states. Thus, understanding human facial expressions has been a premier research area in the computer vision and machine learning community for decades, achieving significant advances in face tracking, reconstruction and synthesis. In this dissertation, we study two aspects of face modeling: the reconstruction of human facial expressions via interpretable 3D blendshape representation from different input modalities, and the reversed problem in which we train a model to directly hallucinate coherent facial expressions given any arbitrary portrait and facial action parameters. In the first part, we present a real-time robust 3D face tracking framework from RGBD videos capable of tracking head pose, facial actions and on-the-fly identity adaptation without any user intervention. The tracker is driven by an efficient and flexible 3D shape regressor, capable of tracking faces in extreme instances. In the second part, we present deep recurrent neural network frameworks to predict facial action intensities from speech for real-time animation. Via learning facial actions, these models are also able to implicitly estimate the affective states of the target speaker. Finally, we present a novel deep generative neural network that enables fully automatic facial expression synthesis of an arbitrary portrait with continuous action unit coefficients. Our model directly manipulates image pixels to make the unseen subject in the still photo express various emotions controlled by values of facial AU coefficients, while maintaining her personal characteristics. Our proposed model is purely data-driven and it requires neither a statistical face model nor image processing tricks to enact facial deformations. Furthermore, our model is trained from unpaired data, where the input image and the target expression frame are from different persons. This work enables flexible, effortless facial expression editing.