Traditional learning algorithms based on action feedback in new or uncertain environments try to strike a balance between 'exploitation' - relying on actions or avenues currently believed to be optimal - and 'exploration' - trying new actions or untested avenues to gain more information and potentially discover new, better options. Relying too heavily on exploitation potentially misses out on discovering beneficial actions and trapping an agent in sub-optimal behaviors. Relying too heavily on exploration potentially misses out on reaping the benefits of known, high reward actions. Classical solutions traditionally allow some probability of exploration to encourage discovery, but diminish this probability over time to encourage exploitation. These policies have some guarantees on discovery and utilization of optimal actions, but are frequently naive, exploring blindly on chance. In this talk, I present a number of environments for learning of optimal actions, and show how the data collected from the available actions can not only inform `exploitation' actions, but also direct exploration - determining which actions are worth exploring, and just how much. These data-driven exploration policies have a number of nice properties, including some minimal measure of loss or regret over time. While these necessarily require greater computational expenditures than a blind or naive policy, the payoff comes in smaller data requirements over all. Computational simplifications and implications will also be discussed.
Successfully fleeing the frigid wastelands of undergraduate mathematics at Carnegie Mellon University, Wes Cowan received his PhD in Mathematics from Rutgers in 2016, focusing on applied probability and sequential-decision problems. Since then, Wes has been a post-doc at Rutgers, applying his research in the areas of AI and Machine Learning, and generally giving his students a hard time. His primary interests lie in using probability to model uncertain knowledge and belief about the world, so as to best inform not only what we know and believe today, and to best inform what we should do tomorrow.