CS Events Monthly View

PhD Defense

Towards Visual Learning with Attention Mechanism


Download as iCal file

Wednesday, March 24, 2021, 12:00pm - 02:00pm


Speaker: Lezi Wang

Location : Remote via Zoom


Prof. Dimitris N. Metaxas (Advisor)

Prof. Konstantinos Michmizos

Prof. Georgios Moustakides

Prof. John Yiannis Aloimonos (external member)

Event Type: PhD Defense

Abstract: Tremendous interest in deep learning has emerged in the computer vision research community. The established deep convolutional neural networks (CNNs) have achieved astonishing results in various vision tasks, while there are still problems that need to be addressed. First of all, the CNN models are perceived as ``black-box" with a lack of understanding of the internal function. Recently, the class-wise activation map is proposed to show a visual explanation of model attention, while it still lacks the way to utilize that explanation to guide the learning process. Additionally, the success of deep learning relies on supervised training the models on the large-scale data, which requires humans to create massive annotations. In this dissertation, we address that attention mechanisms can play significant roles in dealing with the challenges mentioned above. First, despite class-wise attention mechanisms providing good localization for an individual class of interest when it comes to interpreting CNNs, these techniques produce attention maps with substantially overlapping responses among different classes, leading to visual confusion and the need for discriminative attention. We address this problem by means of a new framework that makes class-discriminative attention a principled part of the learning process. Second, to get rid of human annotations, we introduce the Co-Attention as weak supervision to generate the positive/negative training samples and a Contrastive Attention module to enhance the feature representations such that the comparative contrast between features of the positive and negative samples are maximized. Third, we adopt the attention in feature space to bridge different vision tasks in a unified learning framework. Extensive experiments on vision benchmarks show the effectiveness of our approaches in terms of improved image classification and segmentation accuracy. Regards the applications, our proposed algorithms are applied to the unsupervised detection of highlighted segments in the videos, joint face detection and landmark localization, and reasoning about human facial behaviors in deception. Additionally, two new benchmarks are collected to support related studies and facilitate researches in the same direction.


Join Zoom Meeting https://us02web.zoom.us/j/78181204644

Meeting ID: 781 8120 4644

Dial by your location
+1 301 715 8592 US (Washington DC)
+1 312 626 6799 US (Chicago)
+1 646 558 8656 US (New York)
+1 253 215 8782 US (Tacoma)
+1 346 248 7799 US (Houston)
+1 669 900 9128 US (San Jose)