Continuous Control with Deep Reinforcement Learning (DDPG)
Authors: Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicholas Heess, Tom Erez, Yuval Tassa, David Silver, & Daan Wierstra Venue: ICLR 2016 This work focuses on solving the problem of an complex environment AND complex actions. Formally, the work presents "an actor-critic, model-free algorithm based on the deterministic policy gradient (DPG) that can operate over continuous action spaces". This work builds on to two prior works, namely the Deep Q Network (DQN) and DPG algorithm. While DQN proposed using a deep neural network to enable RL to perform well in more complex tasks, it suffers from instability in large action spaces. Orthogonally, DPG offers a solution to large action spaces, but cannot support use of a deep network. This work extends DPG such to fix the instability issues by adding batch normalization and a target network. Batch normalization normalizes each dimension such that samples in a minibatch have a unit mean and variance. The target ne...