Spinning Up in Deep RL

At OpenAI, we believe that deep learning generally—and deep reinforcement learning specifically—will play central roles in the development of powerful AI technology. While there are numerous resources available to let people quickly ramp up in deep learning, deep reinforcement learning is more challenging to break into. We’ve designed Spinning Up to help people learn to use these technologies and to develop intuitions about them.

We were inspired to build Spinning Up through our work with the OpenAI Scholars⁠(opens in a new window) and Fellows⁠(opens in a new window) initiatives, where we observed that it’s possible for people with little-to-no experience in machine learning to rapidly ramp up as practitioners, if the right guidance and resources are available to them. Spinning Up in Deep RL was built with this need in mind and is integrated into the curriculum for 2019 cohorts⁠(opens in a new window) of Scholars and Fellows.

We’ve also seen that being competent in RL can help people participate in interdisciplinary research areas like AI safety⁠(opens in a new window), which involve a mix of reinforcement learning and other skills. We’ve had so many people ask for guidance in learning RL from scratch, that we’ve decided to formalize the informal advice we’ve been giving.

Spinning Up in Deep RL consists of the following core components:

A short introduction⁠(opens in a new window) to RL terminology, kinds of algorithms, and basic theory.
An essay⁠(opens in a new window) about how to grow into an RL research role.
A curated list of important papers⁠(opens in a new window) organized by topic.
A well-documented code repo⁠(opens in a new window) of short, standalone implementations of: Vanilla Policy Gradient (VPG), Trust Region Policy Optimization (TRPO), Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor-Critic (SAC).
And a few exercises⁠(opens in a new window) to serve as warm-ups.

We’ve designed the code for Spinning Up with newcomers in mind, making it short, friendly, and as easy to learn from as possible. Our goal was to write minimal implementations to demonstrate how the theory becomes code, avoiding the layers of abstraction and obfuscation typically present in deep RL libraries. We favor clarity over modularity—code reuse between implementations is strictly limited to logging and parallelization utilities. Code is annotated so that you always know what’s going on, and is supported by background material (and pseudocode) on the corresponding readthedocs page.

Author

Joshua Achiam

Acknowledgments

Thanks to the many people who contributed to this launch: Alex Ray, Amanda Askell, Ashley Pilipiszyn, Ben Garfinkel, Catherine Olsson, Christy Dennison, Coline Devin, Daniel Zeigler, Dylan Hadfield-Menell, Eric Sigler, Ge Yang, Greg Khan, Ian Atha, Jack Clark, Jonas Rothfuss, Larissa Schiavo, Leandro Castelao, Lilian Weng, Maddie Hall, Matthias Plappert, Miles Brundage, Peter Zokhov & Pieter Abbeel.