In the past few years, there has been a tremendous amount of progress in the field of computer vision. As of now, we have reliable object detectors and classifiers that can recognize thousands of object categories. However, the ultimate goal of computer vision is to build systems that can understand and reason about images, far beyond scene categorization and object detection. In this thesis, algorithms have been proposed to empower computers with the human-level ability of detecting and reasoning about typicality of images. Furthermore, abnormality of objects are detected and shown to be helpful to increase the generalization capacity of discriminative object classifiers.
First, we addressed the problem of detecting abnormal objects and reasoning about their abnormality in terms of visual attributes, such as irregular shape, texture or color. This attribute-based reasoning is inspired by human-subject experiments that we conducted prior to building computational models. Although these models are trained without seeing any abnormal objects, but are still capable of detecting and reasoning about abnormal cases at the test time. We collected the first image dataset of abnormal objects, which we used to validate the performance of our models.
Second, we consider the more challenging problem of recognizing atypicality in images. We conducted large-scale human-subject experiments using Amazon Mechanical Turk. We performed a thorough analysis of these human responses. This analysis includes factor analysis of subjects’ opinions and unsupervised clustering of user-generated reasons. We proposed a taxonomy of reasons that make an image look atypical. This taxonomy has three main categories of abnormality reasons: Object-centric, Scene-centric and Context-centric. Inspired by this taxonomy, we developed probabilistic frameworks to model typical images, and find atypical images as meaningful deviation from this model. We rank images based on how typical they appear, detect atypical cases and reason about this decision similar to human reasoning.
Third, we used the typicality scores of images and objects to improve the generalization capacity of the state-of-the-art Convolutional Neural Networks (CNN) for the task of object classification. We trained these CNN models based on a weighting loss function that incorporates in the typicality scores of samples. Our experiments showed that this training strategy results in more generalized classifiers, which can be applied even to the extent of abnormal images.