Scalable self-supervised representation learning of world models
Wednesday, June 24, 2020, 11:00am - 12:30pm
Speaker: Sepehr Janghorbani
Location : Remote via Webex
Advisor: Prof. Gerard De Melo
Committee members: Prof. Karl Stratos, Prof. Yongfeng Zhang, Prof. David Pennock
Event Type: Qualifying Exam
Abstract: Humans perceive their surrounding environment through many trial and error interactions and minimal explicit supervision. This has inspired a new family of deep models that are mainly designed for world representation modelling, and work based on self-supervised learning techniques combined with naturally plausible inductive biases. In every world representation model, object detection and tracking is the first and most fundamental step towards meaningful scene perception. In this talk, I will briefly touch on the main drawbacks of traditional supervised models and give a brief background on newly proposed unsupervised models. Furthermore, I will introduce our proposed object-centered holistic perception model, which is able to do object detection, segmentation, tracking and generation as part of a single model all at once. This model is trained without any supervision at all. Previous unsupervised state-of-the-art models possess important limitations, either not considering temporal dynamics of the environment or not scaling to crowded scenes with many objects. Our proposed probabilistic generative model (1) significantly improves the tracking scalability (two orders of magnitude) compared to the state of the art models, to nearly a hundred objects. (2) Reduces computation time from O(N) to O(1). (3) Is able to model scenes with complex dynamic backgrounds. (4) Is the ﬁrst unsupervised object representation model shown to work for natural scenes containing several tens of moving objects. (4) Is shown to work reasonably well on both a synthetic dataset as well as real CCTV camera footage. Finally I will also explore how this unsupervised approach can be connected to some applications in task-specific language semantic understanding and planning domain generation.