CS Events
Computer Science Department ColloquiumA Step Further Toward Scalable and Automatic Distributed Large Language Model Pre-training |
|
||
Tuesday, February 27, 2024, 10:30am - 12:00pm |
|||
Speaker: Hongyi Wang
Bio
Hongyi Wang is a Senior Project Scientist at the Machine Learning Department of CMU working with Prof. Eric Xing. He obtained his Ph.D. degree from the Department of Computer Sciences at the University of Wisconsin-Madison, where he was advised by Prof. Dimitris Papailiopoulos. Dr. Wang received the Rising Stars Award from the Conference on Parsimony and Learning in 2024 and the Baidu Best Paper Award at the Spicy FL workshop at NeurIPS 2020. He led the distributed training effort of LLM360, an academic research initiative advocating for fully transparent open-source LLMs. His research has been adopted by companies like IBM, Sony, and FedML Inc., and he is currently funded by NSF, DARPA, and Semiconductor Research Corporation.
Location : CoRE 301
:
Event Type: Computer Science Department Colloquium
Abstract: Large Language Models (LLMs), such as GPT and LLaMA, are at the forefront of advances in the field of AI. Nonetheless, training these models is computationally daunting, necessitating distributed training methods. Distributed training, however, generally suffers from bottlenecks like heavy communication costs and the need for extensive performance tuning. In this talk, I will first introduce a low-rank training framework for enhancing communication efficiency in data parallelism. The proposed framework achieves almost linear scalability without sacrificing model quality, by leveraging a full-rank to low-rank training strategy and a layer-wise adaptive rank selection mechanism. Hybrid parallelism, which combines data and model parallelism, is essential for LLM pre-training. However, designing effective hybrid parallelism strategies requires heavy tuning effort and strong expertise. I will discuss how to automatically design high-throughput hybrid-parallelism training strategies using system cost models. Finally, I will demonstrate how to use the automatically designed hybrid parallelism strategies to train state-of-the-art LLMs.
:
Contact Professor Dimitris Metaxas