CS Events Monthly View
PhD DefenseTowards Efficient and Reliable Skeleton-Based Human Pose Modeling |
|
||
Wednesday, November 03, 2021, 11:00am - 01:00pm |
|||
Speaker: Long Zhao
Location : Via Zoom
Committee:
Prof. Dimitris N. Metaxas (Advisor)
Prof. Mubbasir Kapadia
Prof. Hao Wang
Prof. Xiaolei Huang (External member, Penn State University)
Event Type: PhD Defense
Abstract: Understanding human behaviors by deep neural networks has been a central task in computer vision due to its wide application in our daily life. Existing studies have explored various modalities for learning powerful feature representations of human poses, such as RGB frames, optical flows, depth images, and human skeletons. Among them, skeleton-based pose representation has received increasing attention in recent years thanks to its action-focusing nature, compactness, and domain-invariant property. However, prevalent skeleton-based algorithms are typically inefficient in network parameters or training data, but also unreliable in human action forecasting problems. In this dissertation, we explore the benefits and challenges of skeleton-based human action modeling and offer novel solutions to achieve efficient and reliable model performance in human action estimation, recognition, and generation tasks.In the first part of this dissertation, we tackle the problem of model as well as data efficiency in human pose understanding. Given the meaningful topological structure carried by human skeletons, we show that capturing the relationships between joints in the skeleton of a human body by graph neural networks leads to an efficient network architecture that outperforms state of the art while using 90% fewer parameters. Then we present a novel representation learning method to disentangle pose-dependent as well as view-dependent factors from human poses based on mutual information maximization. Empirically, we show that the resulting pose representations can be used for different action recognition scenarios where training data are limited.As the second part of this dissertation, we explore the structure-driven paradigms to make long-term predictions of future human actions by explicitly using skeletons as structural conditions. Such hierarchical strategies are built upon multi-stage generative adversarialnetworks, and typically lead to more robust and reliable predictions than previous appearance-driven ones. To avoid inherent compounding errors in recursive pixel-level prediction, we first estimate high-level structure in the input frames and then predict how that structure evolves in the future. Through developing specialized network architectures, we are able to capture the high-level structure of actions efficiently while preserve temporal coherence, thereby benefiting long-term future forecasting.
Organization:
Rutgers University School of Arts and Sciences
Contact Long Zhao