CS Events
Qualifying ExamOpen Vocabulary Object Detection with Pretrained Vision and Language Models |
|
||
Tuesday, November 28, 2023, 08:30am - 10:00am |
|||
Speaker: Shiyu Zhao
Location : CoRE 305
Committee:
Professor Dimitris Metaxas (Chair)
Professor Konstantinos Michmizos
Professor Dong Deng
Professor Desheng Zhang
Event Type: Qualifying Exam
Abstract: Recent studies show promising performance in open-vocabulary object detection (OVD) using pseudo labels (PLs) from pretrained vision and language models (VLMs). However, PLs generated by VLMs are extremely noisy due to the gap between the pretraining objective of VLMs and OVD, which blocks further advances on PLs. In this paper, we aim to reduce the noise in PLs and propose a method called online Self-training And a Split-and-fusion head for OVD (SAS-Det). First, the self-training finetunes VLMs to generate high quality PLs while prevents forgetting the knowledge learned in the pretraining. Second, a split-and-fusion (SAF) head is designed to remove the noise in localization of PLs, which is usually ignored in existing methods. It also fuses complementary knowledge learned from both precise ground truth and noisy pseudo labels to boost the performance. Extensive experiments demonstrate SAS-Det is both efficient and effective. Our pseudo labeling is 3 times faster than prior methods. SAS-Det outperforms prior state-of-the-art models of the same scale by a clear margin and achieves 37.4 AP_50 and 27.3 AP_r on novel categories of the COCO and LVIS benchmarks, respectively.
:
Contact Professor Dimitris Metaxas (Chair)