Skip to content Skip to navigation
Faculty Candidate Talk
3/7/2019 10:30 am
CoRE A 301

Runtime Supports for Scalable and Efficient Big Data Processing

Khanh Nguyen, UCLA

Faculty Host: Richard Martin

Abstract

Big Data analytics has taken an important role in modern computing. The availability of an enormous amount of data has led to the proliferation of large-scale, data-intensive applications. Popular Big Data systems are developed in managed languages such as Java, Scala, and C#. This is primarily because these languages enable fast development cycles due to simple usage and automatic memory management. However, a managed runtime comes at a cost which is easily magnified in the context of Big Data, causing unsatisfactory performance and low scalability. Our experience with dozens of real-world systems reveals the root cause is the mismatch between the fundamental assumptions based on which the current runtime is designed and the characteristics of data-intensive workloads.

In this talk, I will present my work in developing a “Big Data” friendly runtime system, solving the mismatches in real-world systems. Specifically, I will discuss two representative components: Yak, a hybrid GC that provides high throughput and low latency, and Skyway, an efficient mechanism to connect managed heaps of different nodes in a cluster.

Bio

Khanh Nguyen is a Ph.D. candidate in the Computer Science Department at UCLA, working with Harry Xu on the intersection of systems and programming languages. He has led the development of a series of compiler and runtime system support to improve the performance of several real-world Big Data systems such as Spark and Hadoop. His work has attracted much attention from both academia and industry. He is a recipient of the Google Ph.D. Fellowship in Systems and Networking, and a  Facebook Ph.D. Fellowship Finalist.