CS Events

PhD Defense

Computational Modeling for Food Image and Video Understanding

 

Download as iCal file

Monday, December 16, 2024, 10:00am - 11:30am

 

Speaker: Jiatong Li

Location : CBIM 22

Committee

Prof. Vladimir Pavlovic (chair)

Prof. Yongfeng Zhang

Prof. Hao Wang

Dr. Ajay Divakaran (external)

Event Type: PhD Defense

Abstract: Although sophisticated computer vision models have excelled in numerous tasks, they often struggle with images or videos related to food, especially those depicting cooked meals or food preparation. The food domain presents numerous challenges due to factors such as hidden ingredients, alterations in the appearance of ingredients during cooking, and significant variability in images of dishes prepared with the same recipes. Moreover, cooking procedures frequently allow for interchangeable or potentially parallel sequences of steps, leading to significant variability in the videos depicting cooking. Therefore, designing computer vision models for applications involving food images and videos necessitates particular attention because of the domain’s inherent complexity.This dissertation focuses on computational modeling for both cooking images and videos. For images, we study the problem of predicting the relative amounts of the ingredients. We propose PITA framework leveraging retrieval features and Wasserstein distance to solve the problem. We show PITA significantly outperforms previous baselines. In the context of videos, we address the challenge of deriving an instruction flow graph for each video, which models sequencing as well as parallelism or interchangeability of the cooking steps. We propose Box2flow, which combines features from object detection and BERT to calculate pairwise edge probabilities. A spanning-tree algorithm is then used to predict the flow graph from edge probabilities for each video input. We show that Box2flow outperforms standard captioning-based methods.Our research paves the way for numerous important directions in future studies, particularly when integrated with Large Language Models. The methodologies developed here for food analysis are adaptable and could be utilized in broader computer science challenges involving complex datasets.

Contact  Professor Vladimir Pavlovic (chair)

Join Zoom Meeting
Https://rutgers.zoom.us/j/97642704942?pwd=Va17UWESyvYm6hleoua1Iz2RgKcqt9.1

Meeting ID: 976 4270 4942
Password: 005766