CS Events
Computer Science Department ColloquiumBuilding Better Data-Intensive Systems Using Machine Learning |
|
||
Tuesday, April 04, 2023, 10:30am - 11:30am |
|||
Speaker: Ibrahim Sabek
Bio
Ibrahim Sabek is a postdoc at MIT and an NSF/CRA Computing Innovation Fellow. He is interested in
building the next generation of machine learning-empowered data management, processing, and analysis
systems. Before MIT, he received his Ph.D. from University of Minnesota, Twin Cities, where he studied
machine learning techniques for spatial data management and analysis. His Ph.D. work received the
University-wide Best Doctoral Dissertation Honorable Mention from University of Minnesota in 2021.
He was also awarded the first place in the graduate student research competition (SRC) in ACM
SIGSPATIAL 2019 and the best paper runner-up in ACM SIGSPATIAL 2018.
Location : CoRE 301
:
Event Type: Computer Science Department Colloquium
Abstract: Database systems have traditionally relied on handcrafted approaches and rules to store large-scale data and process user queries over them. These well-tuned approaches and rules work well for the general-purpose case, but are seldom optimal for any actual application because they are not tailored for the specific application properties (e.g., user workload patterns). One possible solution is to build a specialized system from scratch, tailored to each application's needs. Although such a specialized system is able to get orders-of-magnitude better performance, building it is time-consuming and requires a substantial manual effort. This pushes the need for automated solutions that abstract system-building complexities while getting as close as possible to the performance of specialized systems.In this talk, I will show how we leverage machine learning to instance-optimize the performance of query scheduling and execution operations in database systems. In particular, I will show how deep reinforcement learning can fully replace a traditional query scheduler. I will also show that—in certain situations—even simpler learned models, such as piece-wise linear models approximating the cumulative distribution function (CDF) of data, can help improve the performance of fundamental data structures and execution operations, such as hash tables and in-memory join algorithms.
:
Contact Amelie Marian