CS Events Monthly View
SeminarA Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem |
|
||
Wednesday, October 18, 2017, 11:00am |
|||
Bandit learning models the common setting when the decisions of an algorithm feed back into its training data, and it cannot observe counter-factuals. These settings include criminal recidivism prediction (would an inmate have committed another crime had he been released?), lending (would an applicant denied a loan have paid it back?), and drug trials (how would a patient have responded to a different drug?) The main tension in bandit learning is balancing exploration with exploitation. However, exploration, which explicitly sacrifices welfare today in exchange for potential future benefit, can be distasteful when decisions pertain to individuals. For example, it might be considered unethical to give drug A to a patient while knowing drug B often cures their illness merely to learn about drug A’s efficacy. In other settings (like predictive policing), a lack of exploration has been blamed for perpetuating unfairness. This motivates studying the performance of the greedy algorithm, which only exploits and never explores. Although this is not a no-regret algorithm, we show that when adversarially selected contexts are perturbed by a small amount of Gaussian noise, it recovers strong no-regret bounds.
Speaker: Steven Wu
Bio
NULL
Location : CoRE A 301
Committee:
Rutgers/DIMACS Theory of Computing
Event Type: Seminar
:
Organization:
Microsoft Research