CS Events Monthly View

PhD Defense

Towards Efficient and Reliable Skeleton-Based Human Pose Modeling


Download as iCal file

Wednesday, November 03, 2021, 11:00am - 01:00pm


Speaker: Long Zhao

Location : Via Zoom


Prof. Dimitris N. Metaxas (Advisor)

Prof. Mubbasir Kapadia

Prof. Hao Wang

Prof. Xiaolei Huang (External member, Penn State University)

Event Type: PhD Defense

Abstract: Understanding human behaviors by deep neural networks has been a central task in computer vision due to its wide application in our daily life. Existing studies have explored various modalities for learning powerful feature representations of human poses, such as RGB frames, optical flows, depth images, and human skeletons. Among them, skeleton-based pose representation has received increasing attention in recent years thanks to its action-focusing nature, compactness, and domain-invariant property. However, prevalent skeleton-based algorithms are typically inefficient in network parameters or training data, but also unreliable in human action forecasting problems. In this dissertation, we explore the benefits and challenges of skeleton-based human action modeling and offer novel solutions to achieve efficient and reliable model performance in human action estimation, recognition, and generation tasks. In the first part of this dissertation, we tackle the problem of model as well as data efficiency in human pose understanding. Given the meaningful topological structure carried by human skeletons, we show that capturing the relationships between joints in the skeleton of a human body by graph neural networks leads to an efficient network architecture that outperforms state of the art while using 90% fewer parameters. Then we present a novel representation learning method to disentangle pose-dependent as well as view-dependent factors from human poses based on mutual information maximization. Empirically, we show that the resulting pose representations can be used for different action recognition scenarios where training data are limited. As the second part of this dissertation, we explore the structure-driven paradigms to make long-term predictions of future human actions by explicitly using skeletons as structural conditions. Such hierarchical strategies are built upon multi-stage generative adversarial networks, and typically lead to more robust and reliable predictions than previous appearance-driven ones. To avoid inherent compounding errors in recursive pixel-level prediction, we first estimate high-level structure in the input frames and then predict how that structure evolves in the future. Through developing specialized network architectures, we are able to capture the high-level structure of actions efficiently while preserve temporal coherence, thereby benefiting long-term future forecasting.


Rutgers University School of Arts and Sciences

Contact  Long Zhao

Join Zoom Meeting https://rutgers.zoom.us/j/99981171179?pwd=ZU1LMnVzSWNrd1pNWTVKdm5XRGJmZz09 Join by SIP This email address is being protected from spambots. You need JavaScript enabled to view it. Meeting ID: 999 8117 1179 Password: 076742 One tap mobile +13126266799,,99981171179# US (Chicago) +16465588656,,99981171179# US (New York) Join By Phone +1 312 626 6799 US (Chicago) +1 646 558 8656 US (New York) +1 301 715 8592 US (Washington DC) +1 346 248 7799 US (Houston) +1 669 900 9128 US (San Jose) +1 253 215 8782 US (Tacoma) Meeting ID: 999 8117 1179 Find your local number: https://rutgers.zoom.us/u/ab4FHofI6w Join by Skype for Business https://rutgers.zoom.us/skype/99981171179