CS Events

Qualifying Exam

Open Vocabulary Object Detection with Pretrained Vision and Language Models

 

Download as iCal file

Tuesday, November 28, 2023, 08:30am - 10:00am

 

Speaker: Shiyu Zhao

Location : CoRE 305

Committee

Professor Dimitris Metaxas (Chair)

Professor Konstantinos Michmizos

Professor Dong Deng

Professor Desheng Zhang

Event Type: Qualifying Exam

Abstract: Recent studies show promising performance in open-vocabulary object detection (OVD) using pseudo labels (PLs) from pretrained vision and language models (VLMs). However, PLs generated by VLMs are extremely noisy due to the gap between the pretraining objective of VLMs and OVD, which blocks further advances on PLs. In this paper, we aim to reduce the noise in PLs and propose a method called online Self-training And a Split-and-fusion head for OVD (SAS-Det). First, the self-training finetunes VLMs to generate high quality PLs while prevents forgetting the knowledge learned in the pretraining. Second, a split-and-fusion (SAF) head is designed to remove the noise in localization of PLs, which is usually ignored in existing methods. It also fuses complementary knowledge learned from both precise ground truth and noisy pseudo labels to boost the performance. Extensive experiments demonstrate SAS-Det is both efficient and effective. Our pseudo labeling is 3 times faster than prior methods. SAS-Det outperforms prior state-of-the-art models of the same scale by a clear margin and achieves 37.4 AP_50 and 27.3 AP_r on novel categories of the COCO and LVIS benchmarks, respectively.

Contact  Professor Dimitris Metaxas (Chair)