Qualifying Exam

Qualifying Exam

SAViR-T: Spatially Attentive Visual Reasoning with Transformers

 

Download as iCal file

Wednesday, October 19, 2022, 05:30pm

 

Visual Reasoning (VR) operates as a way to measure machine intelligence, by employing previously gained knowledge in new settings. Specifically, in VR, we aim to extract and identify task-relevant information from images. For example, in Raven's Progressive Matrices (RPMs), an instance of VR, we are given an incomplete 3x3 image puzzle. We should find the governing rules that generated the puzzle in order to solve it. In this talk, we will explore the importance of localized spatial information for the solution of RPM puzzles. Our proposed model SAViR-T considers explicit spatial semantics of visual elements within each image in the puzzle, encoded as spatio-visual tokens, and learns the intra-image as well as the inter-image token dependencies. Token-wise relationships, modeled through a transformer-based SAViR-T architecture, followed by a reasoning module are used to extract the underlying rule representations between the rows of the RPM. We use these relation representations to complete the puzzle. Finally, to demonstrate the efficacy of our approach we performed extensive experiments across both synthetic datasets, including RAVEN, I-RAVEN, RAVEN-FAIR, and the natural image-based "V-PROM".

Speaker: Kalliopi Basioti

Location : Virtual

Committee

Professor Vladimir Pavlovic

Professor Srinivas Narayana Ganapathy

Professor Hao Wang

Professor Yongfeng Zhang

Event Type: Qualifying Exam

Abstract: Visual Reasoning (VR) operates as a way to measure machine intelligence, by employing previously gained knowledge in new settings. Specifically, in VR, we aim to extract and identify task-relevant information from images. For example, in Raven's Progressive Matrices (RPMs), an instance of VR, we are given an incomplete 3x3 image puzzle. We should find the governing rules that generated the puzzle in order to solve it. In this talk, we will explore the importance of localized spatial information for the solution of RPM puzzles. Our proposed model SAViR-T considers explicit spatial semantics of visual elements within each image in the puzzle, encoded as spatio-visual tokens, and learns the intra-image as well as the inter-image token dependencies. Token-wise relationships, modeled through a transformer-based SAViR-T architecture, followed by a reasoning module are used to extract the underlying rule representations between the rows of the RPM. We use these relation representations to complete the puzzle. Finally, to demonstrate the efficacy of our approach we performed extensive experiments across both synthetic datasets, including RAVEN, I-RAVEN, RAVEN-FAIR, and the natural image-based "V-PROM".

Organization

Rutgers University School of Arts & Sciences

Contact  Professor Vladimir Pavlovic

Zoom Info:https://rutgers.zoom.us/j/96737105699?pwd=NmdiRkZKaGtHbis4SitsdmpxaU9qdz09
Meeting ID: 967 3710 5699
Password: 146162