Big Data analytics has taken an important role in modern computing. The availability of an enormous amount of data has led to the proliferation of large-scale, data-intensive applications. Popular Big Data systems are developed in managed languages such as Java, Scala, and C#. This is primarily because these languages enable fast development cycles due to simple usage and automatic memory management. However, a managed runtime comes at a cost which is easily magnified in the context of Big Data, causing unsatisfactory performance and low scalability. Our experience with dozens of real-world systems reveals the root cause is the mismatch between the fundamental assumptions based on which the current runtime is designed and the characteristics of data-intensive workloads.
In this talk, I will present my work in developing a “Big Data” friendly runtime system, solving the mismatches in real-world systems. Specifically, I will discuss two representative components: Yak, a hybrid GC that provides high throughput and low latency, and Skyway, an efficient mechanism to connect managed heaps of different nodes in a cluster.