How Microsoft AI defeated Ms Pacman

with Rahul Mehrotra, Tavian Barnes

Microsoft Research developed a model called Hybrid Reward Architecture for scaling reinforcement learning to tasks that have extremely complex value functions and very large state space. The model achieved the highest score possible on Ms. PacMan – a previously unsolved task, and beat state of the art models by 10x and human baseline by 50%. The new technique breaks down large complex problems to many smaller and simpler problems, and the skills developed by each agent can be reused across similar tasks. This stream of research makes RL more tractable for solving general problems in the enterprise settings where organizations require multi-disciplinary teams with different skillsets to find a solution. In this session we will go behind the scenes on the science and fun that drove this industry level achievement.