Continuous Control with Deep Reinforcement Learning (DDPG)

Authors: Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicholas Heess, Tom Erez, Yuval Tassa, David Silver, & Daan Wierstra
Venue: ICLR 2016

This work focuses on solving the problem of an complex environment AND complex actions. Formally, the work presents "an actor-critic, model-free algorithm based on the deterministic policy gradient (DPG) that can operate over continuous action spaces". This work builds on to two prior works, namely the Deep Q Network (DQN) and DPG algorithm. While DQN proposed using a deep neural network to enable RL to perform well in more complex tasks, it suffers from instability in large action spaces. Orthogonally, DPG offers a solution to large action spaces, but cannot support use of a deep network. This work extends DPG such to fix the instability issues by adding batch normalization and a target network. Batch normalization normalizes each dimension such that samples in a minibatch have a unit mean and variance. The target networks are decoupled actor and critic networks that are incrementally updated. While the online policy network is updated episodically, the target network's update only moves fractionally toward the online network's parameters in each update. Despite the fact that this slows learning, the authors find it is a valuable trade-off due to to the gains in stability.

Full Text

See also:
DPG: Deterministic Policy Gradients:
DQN: "Human-Level Control through Deep Reinforcement Learning

Comments

Popular posts from this blog

Fundamental Latency Trade-offs in Architecting DRAM Caches (Alloy Cache)

ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector Extensions

AutoFDO: Automatic Feedback-Directed Optimization for Warehouse-Scale Applications