Cloud computing plays a critical role in providing computing resources to many organizations. The relentless of the need for cloud service makes reliability and efficiency two primary metrics of interest. However, the existing data center system design falls short on these two goals. Specifically, (1) operating systems have significant overheads in providing virtualization support to cloud applications; (2) network infrastructure incurs excessive cost; (3) infrastructure problems are notoriously difficult to debug and mitigate.
I will cover three projects for tackling these problems. My main focus of the talk is Slim, an efficient network stack design for container virtualization. Unlike traditional container networking approaches that rely on packet-based network virtualization, Slim virtualizes the network at a per-connection level, lowering the overheads of the operating system. Slim results in 11-66% CPU utilization reduction on popular cloud applications, such as Memcached, Nginx, PostgreSQL, and Apache Kafka. I will then briefly touch on CorrOpt, a system that reduces packet corruption loss in the data center networks by three to six orders of magnitude, and RAIL, a data center network architecture that reduces the total cost of the network by up to 44%. At the end of the talk, I will discuss future trends in the data center system space.