Learning dexterity

By building simulations that support transfer, we have reduced the problem of controlling a robot in the real world to accomplishing a task in simulation, which is a problem well-suited for reinforcement learning. While the task of manipulating an object in a simulated hand is already somewhat difficult⁠, learning to do so across all combinations of randomized physical parameters is substantially more difficult.

To generalize across environments, it is helpful for the policy to be able to take different actions in environments with different dynamics. Because most dynamics parameters cannot be inferred from a single observation, we used an LSTM⁠(opens in a new window)—a type of neural network with memory—to make it possible for the network to learn about the dynamics of the environment. The LSTM achieved about twice as many rotations in simulation as a policy without memory.

Dactyl learns using Rapid⁠, the massively scaled implementation of Proximal Policy Optimization developed to allow OpenAI Five to solve Dota 2. We use a different model architecture, environment, and hyperparameters than OpenAI Five does, but we use the same algorithms and training code. Rapid used 6144 CPU cores and 8 GPUs to train our policy, collecting about one hundred years of experience in 50 hours.

For development and testing, we validated our control policy against objects with embedded motion tracking sensors to isolate the performance of our control and vision networks.

Authors

Josh Tobin, Bob McGrew, Wojciech Zaremba, Maciek Chociej, Szymon Sidor, Glenn Powell, Jakub Pachocki, Alex Ray, Marcin Andrychowicz, Bowen Baker, Arthur Petron, Matthias Plappert, Rafał Józefowicz, Jonas Schneider, Peter Welinder, Lilian Weng

Contributors

Larissa Schiavo, Jack Clark, Greg Brockman, Ilya Sutskever, Diane Yoon, Ben Barry

Feedback

Thanks to the following for feedback on drafts of this post: Pieter Abbeel, Tamim Asfour, Marek Cygan, Ken Goldberg, Anna Goldie, Edward Mehr, Azalia Mirhoseini, Lerrel Pinto, Aditya Ramesh & Ian Rust.

Cover Art

Ben Barry & Eric Haines