Skip to content Skip to navigation

He Zhu receives PLDI 2019 distinguished paper award

Tuesday, July 23, 2019
 
Computer science new faculty member, He Zhu, received the Distinguished Paper Award at the ACM SIGPLAN conference on Programming Language Design and Implementation 2019, a top-tier programming language conference held in Phoenix in June. He co-authored the paper, entitled "An Inductive Synthesis Framework for Verifiable Reinforcement Learning," with collaborators Zikang Xiong, a doctoral student at Purdue University, Suresh Jagannathan, a professor at Purdue University and Stephen Magill, a principal scientist at Galois, Inc. It was one of six papers that shared the award from among 76 papers accepted from 274 submissions. The paper presents the authors' novel idea of using constrained and symbolic abstractions to represent and assure the safety of learning models, a step towards combining deep learning and symbolic reasoning.
 
Despite the tremendous advances that have been made in the last decade on developing useful machine-learning applications, their wider adoption has been hindered by the lack of strong safety guarantees that can be made about their behavior. Robots trained in controlled settings routinely fail in the real world. The awarded paper applies program verification to certify a machine-learning function. The technique synthesizes provably safe programmatic policy, simpler, more interpretable, approximations of neural network policies in a reinforcement learning environment, even when the environment is unanticipated or previously unobserved. To handle the performance gap between programmatic and neural policies, the method executes the unverified but higher-performance neural policy rather than the verified but lower-performance programmatic policy and monitors the system state during execution. If the system is about to execute an unsafe action, control is switched to the programmatic policy (the shield), whose actions are guaranteed to be safe. If unsafe actions tend to be executed rarely — a reasonable assumption for properly trained neural policies — this approach guarantees safety without significantly compromising performance.