Online learning is a canonical paradigm that sheds light on how to act effectively at the face of uncertainty. Classical results on non-stochastic online learning show robust worst-case guarantees on the performance without making any assumption on the input. Recently there is emerging interest to understand how one can use the existence of some nice structure in the data to obtain enhanced guarantees when this structure indeed exists without sacrificing the worst-case performance.
In this talk, I will focus on one such nice structure: the existence of an alternative with very good performance and aim for guarantees that degrade with its performance (small-loss bounds). I will focus on a general graph-based feedback setting where, at each round, a decision-maker selects an action from a finite set of alternatives and receives feedback based on a combinatorial graph-based feedback model introduced by Mannor and Shamir. This encapsulates as special cases important partial feedback settings such as bandits, online routing, and contextual bandits. Small-loss bounds are of particular interest in these partial-feedback settings as, to be satisfied, they require algorithms that do not suffer from an important drawback of classical worst-case techniques: the need to explore often all alternatives, including suboptimal ones. I will provide a general black-box reduction for these settings achieving high-probability small-loss bounds for the graph-based feedback setting and illustrate how our technique actively avoids this over-exploration.
The talk will be mostly based on joint work with Karthik Sridharan and Eva Tardos that appeared in COLT 18 and is currently under revision in Mathematics of Operations Research.