Learning reliable and interpretable representations is the foundation for the recent success of deep neural network models. However, many latent factors are highly entangled in the representation space. A direct mapping from the source to the target would require tremendous annotations, tedious training, and very often, result in limited performance when extensive variations exist. In this study, we investigate features learned by deep neural networks and leverage domain knowledge to factorize and disentangle latent factors in a supervised or weekly supervised manner. Specifically, we show that the disentangled representation is efficient to learn and robust to variations. It guarantees state-of-the-art performance in multiple face analysis tasks such as head pose estimation, facial landmark tracking, and deep face recognition.