Inferring Time-delayed Hidden Causal Relations In Reinforcement Learning
Friday, May 22, 2020, 01:00pm - 02:00pm
Location : Remote via Webex
Prof. Abdeslam Boularias, Prof. Sungjin Ahn, Prof. Yongfeng Zhang, Prof. Shubhangi Saraf
Event Type: Qualifying Exam
Abstract: Despite the remarkable progress made by deep RL agents in reaching human-level performance and beyond, they continue to lag behind humans in terms of data efficiency. Model-based RL algorithms arguably require less data than model-free ones. But learning models that are sufficiently accurate for planning is still a challenging problem. The difficulty in learning accurate predictive models can be mostly attributed to the partial observability of the states. In robotics, for example, the Markov condition is seldom verified. Future states and rewards often depend on the entire history of actions and observations. LSTM and GRU architectures are general-purpose tools for solving problems of partial observability by discovering and remembering pertinent information. They tend, however, to require large amounts of data, and they cannot be easily interpreted. To address these two issues, we present here an approach that combines the merits of general function approximators such as neural networks with probabilistic graphical models for representing hidden variables. Given a stream of actions, observations and rewards, a neural network is trained to predict future observations and rewards. Simultaneously, a graphical model of causal relations between observations occurring at different time steps is also gradually constructed. The values of the variables in the graph are also provided to the neural network as additional inputs along with the observations. The learned predictive model is then utilized by the agent to select actions based on their predicted future rewards.