CS Events

Computer Science Department Colloquium

Building Better Data-Intensive Systems Using Machine Learning

 

Download as iCal file

Tuesday, April 04, 2023, 10:30am - 11:30am

 

Speaker: Ibrahim Sabek

Bio

Ibrahim Sabek is a postdoc at MIT and an NSF/CRA Computing Innovation Fellow. He is interested in
building the next generation of machine learning-empowered data management, processing, and analysis
systems. Before MIT, he received his Ph.D. from University of Minnesota, Twin Cities, where he studied
machine learning techniques for spatial data management and analysis. His Ph.D. work received the
University-wide Best Doctoral Dissertation Honorable Mention from University of Minnesota in 2021.
He was also awarded the first place in the graduate student research competition (SRC) in ACM
SIGSPATIAL 2019 and the best paper runner-up in ACM SIGSPATIAL 2018.

Location : CoRE 301

Event Type: Computer Science Department Colloquium

Abstract: Database systems have traditionally relied on handcrafted approaches and rules to store large-scale data and process user queries over them. These well-tuned approaches and rules work well for the general-purpose case, but are seldom optimal for any actual application because they are not tailored for the specific application properties (e.g., user workload patterns). One possible solution is to build a specialized system from scratch, tailored to each application's needs. Although such a specialized system is able to get orders-of-magnitude better performance, building it is time-consuming and requires a substantial manual effort. This pushes the need for automated solutions that abstract system-building complexities while getting as close as possible to the performance of specialized systems.In this talk, I will show how we leverage machine learning to instance-optimize the performance of query scheduling and execution operations in database systems. In particular, I will show how deep reinforcement learning can fully replace a traditional query scheduler. I will also show that—in certain situations—even simpler learned models, such as piece-wise linear models approximating the cumulative distribution function (CDF) of data, can help improve the performance of fundamental data structures and execution operations, such as hash tables and in-memory join algorithms.

Contact  Amelie Marian

Livestream available via Zoom:
https://rutgers.zoom.us/j/96273316551?pwd=T3BtNU03ZUpZRUxTcVd1T0NaK0Ridz09