Robots that learn

We’ve created a robotics system, trained entirely in simulation and deployed on a physical robot, which can learn a new task after seeing it done once.

The system is powered by two neural networks: a vision network and an imitation network.

The vision network ingests an image from the robot’s camera and outputs state representing the positions of the objects. As before⁠(opens in a new window), the vision network is trained with hundreds of thousands of simulated images with different perturbations of lighting, textures, and objects. (The vision system is never trained on a real image.)

The imitation network observes a demonstration, processes it to infer the intent of the task, and then accomplishes the intent starting from another starting configuration. Thus, the imitation network must generalize the demonstration to a new setting. But how does the imitation network know how to generalize?

The network learns this from the distribution of training examples. It is trained on dozens of different tasks with thousands of demonstrations for each task. Each training example is a pair of demonstrations that perform the same task. The network is given the entirety of the first demonstration and a single observation from the second demonstration. We then use supervised learning to predict what action the demonstrator took at that observation. In order to predict the action effectively, the robot must learn how to infer the relevant portion of the task from the first demonstration.

Applied to block stacking, the training data consists of pairs of trajectories that stack blocks into a matching set of towers in the same order, but start from different start states. In this way, the imitation network learns to match the demonstrator’s ordering of blocks and size of towers without worrying about the relative location of the towers.

Authors

Peter Welinder, Bob McGrew, Jonas Schneider, Yan Duan, Josh Tobin, Rachel Fong, Alex Ray, Filip Wolski, Vikash Kumar, Jonathan Ho

Authors

Marcin Andrychowicz, Bradly Stadie, Ankur Handa, Matthias Plappert, Erika Reinhardt, Pieter Abbeel, Greg Brockman, Ilya Sutskever, Jack Clark, Wojciech Zaremba