Past Events

PhD Defense

Towards Zero-Shot Learning: Object Recognition with Semantic Descriptions


Download as iCal file

Friday, April 23, 2021, 10:30am - 12:30pm


Speaker: Yizhe Zhu

Location : Remote via Zoom


Prof. Ahmed Elgammal (advisor and chair)

Prof. Yongfeng Zhang

Prof. Gerard de Melo

Prof. Haibin Ling (external member, Stony Brook University)

Event Type: PhD Defense

Abstract: Human beings have the remarkable ability to recognize novel visual objects only based on the description with semantic concepts. Deep learning, however, often requires a large number of labeled training samples to achieve superior performance. It's laborious, expensive, and difficult to collect sufficient labeled data for each object because the distribution of data among categories in the real world is long-tailed. Therefore, it is in great demand to research how to recognize novel visual objects with no visual data being seen. Zero-shot learning aims to expand the learned knowledge from seen objects, of which we have training samples, to unseen objects, of which we have no training data. To mimic the human ability to recognize novel objects, zero-shot learning leverages semantic descriptions of classes to bridge the gap between both seen and unseen classes.This thesis concerns zero-shot learning from three aspects: the semantic descriptions of classes, the discriminative visual representation of images, and the algorithm design. We first present a visual-semantic embedding-based method that embeds visual data and semantic data to a shared embedding space. Images are encoded by a part-based CNN that detects bird parts and learns part-specific representation. The sparsity regularizer forces the models to connect text terms to their relevant parts and suppress connections to non-visual text terms without any part-text annotations. Then we propose a novel algorithm to leverage generative models to synthesize the visual data of unseen categories based on semantic descriptions and convert zero-shot learning to a conventional object recognization problem. Finally, we shift our focus to learning better discriminative visual representation of images in part-level for zero-shot learning in a weekly-supervised manner. Experiment results showed that our models significantly improves the performance of zero-shot learning in different settings.


Join Zoom Meeting

Join by SIP

Meeting ID: 676 297 2532
Password: 079032
One tap mobile
+16465588656,,6762972532# US (New York)
+13017158592,,6762972532# US (Washington DC)

Join By Phone
+1 646 558 8656 US (New York)
+1 301 715 8592 US (Washington DC)
+1 312 626 6799 US (Chicago)
+1 669 900 9128 US (San Jose)
+1 253 215 8782 US (Tacoma)
+1 346 248 7799 US (Houston)
Meeting ID: 676 297 2532
Find your local number:

Join by Skype for Business