CS Events

Faculty Candidate Talk

Efficient Algorithms and Hardness for Fundamental Problems in Data Science


Download as iCal file

Thursday, February 18, 2021, 10:30am


Speaker: Peng Zhang


Peng Zhang is a Postdoctoral Associate in Computer Science at Yale University, under the supervision of Daniel Spielman. She obtained her Ph.D. from Georgia Tech, advised by Richard Peng. Her research lies broadly in designing and understanding the limits of fast algorithms. She has worked on structured linear equations and linear programs, discrepancy theory and its application in the design of randomized controlled trials. Peng’s work received the best student paper award at FOCS 2017, and Georgia Tech College of Computing Dissertation Award in 2019.

Location : Via Zoom Recording

Event Type: Faculty Candidate Talk

Abstract: Two important components of data science are data and computation. Sometimes, data is cheap but computation is expensive. Sometimes, data can be expensive (for example, when data are collected from experiments with human subjects). In this talk, I will discuss efficient algorithms and hardness addressing fundamental problems in computation and data. In the first part of the talk, I will present hardness and fast algorithms for solving structured linear equations and linear programs, which are central tools in data analysis. Solving arbitrary linear equations and linear programs can be time-consuming. However, many linear equations and linear programs arising commonly in practice have additional structures that enable faster solvers. One example is Laplacian linear equations, which can be solved in nearly linear time (Spielman-Teng, 2004). I will show that a slight generalization of Laplacian linear equations is as hard to solve as arbitrary linear equations. But when we know additional geometric structures about this slightly general class of linear equations, we can design asymptotically faster solvers. Then, I will briefly discuss hardness and fast algorithms for positive linear programs, another example showing that structures enable faster solvers. In the second part of the talk, I will present an algorithm that improves the design of randomized controlled trials, which are widely used to test the effectiveness of interventions. In a trial, we randomly partition all subjects to a treatment group and a control group. We want the two groups to be similar in covariates — features that we know about the subjects. By exploiting ideas from discrepancy theory, we obtain random partitions with a nearly optimal tradeoff between the gain we have when covariates are correlated with treatment outcomes and the loss we suffer when they are not.

Contact  Host: Dr. Eric Allender

Password: %F#ZJ8no