Skip to content Skip to navigation
Seminar
4/27/2016 11:00 am
Core A (Room 301)

Reducing contextual bandits to supervised learning

Daniel Hsu, Columbia University

Organizer(s): Eric Allender, Pranjal Awasthi, Michael Saks and Mario Szegedy

Abstract

I'll describe a fast and simple algorithm for the contextual bandit learning problem, where the learner repeatedly takes an action in response to the observed context, observing the reward only for that action. The algorithm assumes access to an oracle for solving cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only about T^{1/2} oracle calls across all T rounds. Joint work with Alekh Agarwal, Satyen Kale, John Langford, Lihong Li, and Rob Schapire.