CS Events
Computer Science Department ColloquiumMechanistic Interpretability and AI's Safety |
|
||
Tuesday, October 15, 2024, 10:30am - 12:00pm |
|||
Speaker: Zining Zhu
Bio
Zining Zhu is an assistant professor at the Stevens Institute of Technology. He received PhD degree at the University of Toronto and Vector Institute in 2024. He is interested in understanding the mechanisms and abilities of neural network AI systems, and incorporating the findings into controlling the AIs. In the long term, he looks forward to empowering real-world applications with safe and trustworthy AIs that can collaborate with humans.
Location : CoRE 301
:
Event Type: Computer Science Department Colloquium
Abstract: Recently, LLMs have demonstrated incredible capabilities, leading to concerns about their safety. In this presentation, I argue that one way towards safety is via understanding and controlling the mechanics of LLMs. I will briefly review some most promising mechanisms along some different granularity levels: representation, module, and neuron. I will present some of our lab’s works along each of the granularity levels, and how I consider the future mechanistic interpretability research will benefit AI safety.
:
Contact Professor Yongfeng Zhang