CS Events

Computer Science Department Colloquium

A Step Further Toward Scalable and Automatic Distributed Large Language Model Pre-training

 

Download as iCal file

Tuesday, February 27, 2024, 10:30am - 12:00pm

 

Speaker: Hongyi Wang

Bio

Hongyi Wang is a Senior Project Scientist at the Machine Learning Department of CMU working with Prof. Eric Xing. He obtained his Ph.D. degree from the Department of Computer Sciences at the University of Wisconsin-Madison, where he was advised by Prof. Dimitris Papailiopoulos. Dr. Wang received the Rising Stars Award from the Conference on Parsimony and Learning in 2024 and the Baidu Best Paper Award at the Spicy FL workshop at NeurIPS 2020. He led the distributed training effort of LLM360, an academic research initiative advocating for fully transparent open-source LLMs. His research has been adopted by companies like IBM, Sony, and FedML Inc., and he is currently funded by NSF, DARPA, and Semiconductor Research Corporation.

Location : CoRE 301

Event Type: Computer Science Department Colloquium

Abstract: Large Language Models (LLMs), such as GPT and LLaMA, are at the forefront of advances in the field of AI. Nonetheless, training these models is computationally daunting, necessitating distributed training methods. Distributed training, however, generally suffers from bottlenecks like heavy communication costs and the need for extensive performance tuning. In this talk, I will first introduce a low-rank training framework for enhancing communication efficiency in data parallelism. The proposed framework achieves almost linear scalability without sacrificing model quality, by leveraging a full-rank to low-rank training strategy and a layer-wise adaptive rank selection mechanism. Hybrid parallelism, which combines data and model parallelism, is essential for LLM pre-training. However, designing effective hybrid parallelism strategies requires heavy tuning effort and strong expertise. I will discuss how to automatically design high-throughput hybrid-parallelism training strategies using system cost models. Finally, I will demonstrate how to use the automatically designed hybrid parallelism strategies to train state-of-the-art LLMs.

Contact  Professor Dimitris Metaxas

Join Zoom Meeting
https://rutgers.zoom.us/j/2014444359?pwd=WW9ybFNCNVFrUWlycHowSHdNZjhzUT09
Meeting ID: 201 444 4359
Password: 550978