CS Events
Qualifying ExamConceptual Explanations for Vision and Language Foundation Models |
|
||
Friday, December 06, 2024, 02:00pm - 04:00pm |
|||
Speaker: Hengyi Wang
Location : CoRE 305
Committee:
Professor Hao Wang (Chair)
Professor Dimitris Metaxas
Professor Yongfeng Zhang
Professor Sharon Levy
Event Type: Qualifying Exam
Abstract: Vision and language foundation models, such as Vision Transformers (ViTs) and Pretrained Language Models (PLMs), have seen significant advances due to their capability to process and understand visual and textual information. However, trustworthy and interpretable explanation methods for these models remain underdeveloped, especially in post-hoc conceptual explanations that span multiple modalities. Our work introduces a unified framework to generate conceptual explanations for vision and language models, addressing core desiderata, such as faithfulness and stability. Specifically, we introduce variational Bayesian conceptual explanation methods that model the latent distributions of visual/textual token embeddings, providing post-hoc explanations at the dataset, image, and patch levels. Our analysis reveals how modeling multi-level joint distributions of visual or language embeddings can offer interpretable insights, bridging the gap between vision-language model predictions and human-understandable concepts. Extensive experiments across various benchmarks demonstrate that our approach consistently outperforms existing explanation methods by meeting these desiderata and offering a comprehensive analysis of model predictions.
:
Contact Professor Hao Wang