CS Events

PhD Defense

Scene Graph Parsing and Its Application in CV/NLP Cross-Modal Reasoning


Download as iCal file

Wednesday, March 25, 2020, 02:00pm - 04:00pm


Speaker: Ji Zhang

Location : Remote via Webex


Prof. Ahmed Elgammal (Advisor)

Prof. Yongfeng Zhang

Prof. Gerard de Melo

Dr. Ang Li (External member)

Event Type: PhD Defense

Abstract: Scene graph parsing aims at understanding an image as a graph where vertices are visual objects (potentially with attributes) and edges are visual relationships among objects. This task is commonly seen as an extension to the object detection task where objects are detected individually, while the former requires recognizing relationships between object pairs. Therefore, scene graphs are usually seen as a better semantic representation of images for visual reasoning. In my presentation I will introduce two salient issues that naturally occur in scene graphs: Ambiguity in the language dimension and ambiguity in the visual dimension. The first happens when the vocabulary of objects and relationships are significantly large, and the second happens when multiple vertices or edges in a scene graph are from the same category and confuse the model to recognize the correct relational pairing. At last, with an accurately parsed scene graph, I will talk about how to use it as an extra channel of features for down-stream visual reasoning tasks such as Video Story Question Answering. Concretely, how do we obtain better interpretability by using scene graphs? How much could scene graph help visual reasoning if it is not perfectly constructed? I will dive deep into these details in the presentation.


Meeting link:

Meeting number: 790 611 008
Password: SeMEJGSR222
Host key: 541912