CS Events
Computer Science Department ColloquiumBackdoor in AI: Algorithms, Attacks, and Defenses |
|
||
Tuesday, April 02, 2024, 10:30am |
|||
Speaker: Ruixiang (Ryan) Tang
Bio
Ruixiang (Ryan) Tang is a final-year Ph.D. student at Rice University. His research is primarily concentrated on Trustworthy Artificial Intelligence (AI), with specific emphases on security, privacy, and explainability. He has over 20 research in leading machine learning, data mining, and natural language processing venues such as NeurIPS, ICLR, AAAI, KDD, WWW, TKDD, ACL, EMNLP, NAACL, and Communications of the ACM. Additionally, He closely collaborates with healthcare institutes, such as Yale, Baylor, and UThealth to facilitate the deployment of reliable large language models in the healthcare sector. He has been acknowledged as AMIA'23 Best Student Paper Award, AMIA'22 Best Student Paper (Shortlist) Award, as well as CIKM'23 Honorable Mention for Best Demo Paper Award.
Abstract
As deep learning models are increasingly integrated into critical domains, their safety emerges as a critical concern. This talk delves into the emerging threat of backdoor attacks. These attacks involve embedding a backdoor function within the victim model, allowing attackers to manipulate the model's behavior using specific triggers. The talk will begin with a novel post-training backdoor attack leveraging the injection of a few malicious neurons into a target model, which is training-free and model-agnostic. Then the talk will introduce a novel and effective defense mechanism utilizing a honeypot module to attract backdoor-related functions. In this way, the model is guided to disentangle the harmful backdoor learning from the model's utility tasks. The talk will also explore the security risks in advanced large language models, with a focus on preventing potential misuse. We propose an effective defense method against malicious instruction-tuning attacks. Finally, I will conclude by providing an overview of my research in trustworthy AI and outline future research directions.
Join Zoom Meeting
https://rutgers.zoom.us/j/2014444359?pwd=WW9ybFNCNVFrUWlycHowSHdNZjhzUT09
Meeting ID: 201 444 4359
Password: 550978
:
Location : CoRE 301
:
Event Type: Computer Science Department Colloquium
:
: