CS Events Monthly View

Qualifying Exam

Programmatic Reinforcement Learning

 

Download as iCal file

Wednesday, April 12, 2023, 01:30pm - 03:00pm

 

Abstract: Deep reinforcement learning (RL) has led to encouraging successes in many challenging control tasks. However, a deep RL model lacks interpretability due to the difficulty of identifying how the model's control logic relates to its network structure. Programmatic policies structured in more interpretable representations emerge as a promising solution. We investigate two types of programmatic policies which can solve multiple complex tasks.

The first type is the programmatic reinforcement learning policy. To learn such a policy and solve complex tasks, we must overcome following challenges: First, synthesizing programmatic policies requires optimizing over the discrete and non-differentiable search space of program architectures. Previous works are suboptimal because they only enumerate program architectures greedily guided by a pretrained RL oracle. Second, these works do not exploit compositionality, an important programming concept, to reuse and compose primitive functions to form a complex function for new tasks. We solve these problems by making two major contributions. Our first contribution is a programmatically interpretable RL framework that conducts program architecture search on top of a continuous relaxation of the architecture space defined by programming language grammar rules. Our algorithm allows policy architectures to be learned with policy parameters via bilevel optimization using efficient policy-gradient methods, and thus does not require a pretrained oracle. Our second contribution is improving programmatic policies to support compositionality by integrating primitive functions learned to grasp task-agnostic skills as a composite program to solve novel RL problems. Experiment results demonstrate that our algorithm excels in discovering optimal programmatic policies that are highly interpretable.

The second type is the neurosymbolic programs with loops and conditionals. Neurosymbolic programs with state-conditioned loops and conditionals can naturally represent long-horizon, procedural or repetitive tasks that are difficult to perform using deep RL. Yet, synthesizing loops and conditionals is challenging as it requires optimization over the combinatorial search space of symbolic program structures. To address this challenge, we propose a novel programming by reward (PBR) algorithm for learning neurosymbolic programs. For loops, PBR uses the task reward information of searched programs to bias its search towards potentially high-reward loop structures. For conditionals, PBR lazily synthesizes the conditional branches of a program on an as-needed basis only when the inferred axiomatic semantics of the partial program and its actual execution results disagree. Experiment results demonstrate that PBR significantly outperforms state-of-the-art deep RL algorithms and standard program synthesis baselines on challenging RL tasks including video games, locomotion control, and embodied AI home-assisted robots in complex household environments.

 

Speaker: Wenjie Qiu

Location : CoRE 301

Committee

Prof. He Zhu (advisor)

Prof. Shiqing Ma

Prof. Srinivas Narayana

Prof. Qiong Zhang

 

Event Type: Qualifying Exam

Abstract: See above

Organization

Rutgers University

School of Arts & Sciences

Department of Computer Science

Contact  Professor He Zhu