CS Events

Qualifying Exam

Exploiting pre-trained large-scale text-to-image diffusion models for image & video editing

 

Download as iCal file

Friday, March 29, 2024, 02:00pm - 04:00pm

 

Speaker: Zhixing Zhang

Location : CoRE 305

Committee

Professor Dimitris Metaxas

Assistant Professor Yongfeng Zhang

Associate Professor Konstantinos Michmizos

Professor Mario Szegedy

Event Type: Qualifying Exam

Abstract: Recent advancements in diffusion models have significantly impacted both image and video domains, showcasing remarkable capabilities in text-guided synthesis and editing. In the realm of image editing, particularly for single images like iconic paintings, existing approaches often face challenges such as overfitting and content preservation. To overcome these, a novel approach has been developed that introduces a model-based guidance technique, enhancing the pre-trained diffusion models with the ability to maintain original content while incorporating new features as directed by textual descriptions. This method includes a patch-based fine-tuning process, enabling the generation of high-resolution images and demonstrating impressive editing capabilities, including style modification, content addition, and object manipulation. Extending the prowess of diffusion models to the video sector, text-guided video inpainting presents its own set of challenges, including maintaining temporal consistency, handling various inpainting types with different structural fidelity, and accommodating variable video lengths. The introduction of the Any-Length Video Inpainting with Diffusion Model (AVID) addresses these issues head-on. AVID incorporates effective motion modules and adjustable structure guidance for fixed-length video inpainting and introduces a Temporal MultiDiffusion sampling pipeline with a middle-frame attention guidance mechanism. This innovative approach enables the creation of videos of any desired length, ensuring high-quality inpainting across a wide range of durations and types. Through extensive experimentation, both methodologies have proven their effectiveness, pushing the boundaries of what's possible in image and video editing with diffusion models. Publication:1. Zhang, Zhixing, Ligong Han, Arnab Ghosh, Dimitris N. Metaxas, and Jian Ren. "Sine: Single image editing with text-to-image diffusion models." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6027-6037. 2023.2. Zhang, Zhixing, Bichen Wu, Xiaoyan Wang, Yaqiao Luo, Luxin Zhang, Yinan Zhao, Peter Vajda, Dimitris Metaxas, and Licheng Yu. "AVID: Any-Length Video Inpainting with Diffusion Model." arXiv preprint arXiv:2312.03816 (2023).

Contact  Professor Dimitris Metaxas (Chair)