CS Events

Computer Science Department Colloquium

Mechanistic Interpretability and AI's Safety

 

Download as iCal file

Tuesday, October 15, 2024, 10:30am - 12:00pm

 

Speaker: Zining Zhu

Bio

Zining Zhu is an assistant professor at the Stevens Institute of Technology. He received PhD degree at the University of Toronto and Vector Institute in 2024. He is interested in understanding the mechanisms and abilities of neural network AI systems, and incorporating the findings into controlling the AIs. In the long term, he looks forward to empowering real-world applications with safe and trustworthy AIs that can collaborate with humans.

Location : CoRE 301

Event Type: Computer Science Department Colloquium

Abstract: Recently, LLMs have demonstrated incredible capabilities, leading to concerns about their safety. In this presentation, I argue that one way towards safety is via understanding and controlling the mechanics of LLMs. I will briefly review some most promising mechanisms along some different granularity levels: representation, module, and neuron. I will present some of our lab’s works along each of the granularity levels, and how I consider the future mechanistic interpretability research will benefit AI safety.

Contact  Professor Yongfeng Zhang

Zoom Link:
https://rutgers.zoom.us/j/91559644814?pwd=4wFm8EbHrVPktHPyUib6UG8PaQ8b5z.1