CS Events

Computer Science Department Colloquium

Backdoor in AI: Algorithms, Attacks, and Defenses


Download as iCal file

Tuesday, April 02, 2024, 10:30am


Speaker: Ruixiang (Ryan) Tang


Ruixiang (Ryan) Tang is a final-year Ph.D. student at Rice University. His research is primarily concentrated on Trustworthy Artificial Intelligence (AI), with specific emphases on security, privacy, and explainability. He has over 20 research in leading machine learning, data mining, and natural language processing venues such as NeurIPS, ICLR, AAAI, KDD, WWW, TKDD, ACL, EMNLP, NAACL, and Communications of the ACM. Additionally, He closely collaborates with healthcare institutes, such as Yale, Baylor, and UThealth to facilitate the deployment of reliable large language models in the healthcare sector. He has been acknowledged as AMIA'23 Best Student Paper Award, AMIA'22 Best Student Paper (Shortlist) Award, as well as CIKM'23 Honorable Mention for Best Demo Paper Award.

Join Zoom Meeting

Meeting ID: 201 444 4359

Password: 550978

Location : CoRE 301

Event Type: Computer Science Department Colloquium

Abstract: As deep learning models are increasingly integrated into critical domains, their safety emerges as a critical concern. This talk delves into the emerging threat of backdoor attacks. These attacks involve embedding a backdoor function within the victim model, allowing attackers to manipulate the model's behavior using specific triggers. The talk will begin with a novel post-training backdoor attack leveraging the injection of a few malicious neurons into a target model, which is training-free and model-agnostic. Then the talk will introduce a novel and effective defense mechanism utilizing a honeypot module to attract backdoor-related functions. In this way, the model is guided to disentangle the harmful backdoor learning from the model's utility tasks. The talk will also explore the security risks in advanced large language models, with a focus on preventing potential misuse. We propose an effective defense method against malicious instruction-tuning attacks. Finally, I will conclude by providing an overview of my research in trustworthy AI and outline future research directions.