Visual Relationship Detection

Ji Zhang, Dept. of Computer Science

Examination Committee: Prof. Ahmed Elgammal (Chair); Prof. Vladimir Pavlovic; Prof. Gerard De Melo; Prof. Amelie Marian

Abstract

Visual relationships are defined as <subject, predicate, object> tuples, where the “subject” is related to the “object” by the “predicate” relationship. It is challenging to detect visual relationships large-scale settings for two main reasons. First, a scene with many objects may have only a few individual interacting objects (e.g., in a party image with many people, there might be only a handful of people speaking with each other). To detect all relationships, it would be inefficient to first detect all individual objects and then classify all pairs. Second, it requires a model to handle the widely-spread and imbalanced distribution of <subject, predicate, object> triples. In real-world scenarios with large numbers of objects and relations, some are seen very commonly while others are barely seen. In this talk, we first present a model that reduces the number of potential relationships to a manageable scale, then we tackle the second problem using an embedding-based model trained in a discriminative way. We show via good evaluation results that our first model is able to eliminate unqualified relationship candidates efficiently, and our second model is competent on the task of relationship detection on both large-scale and small-scale datasets.

Go Back to Colloquiua Listing