Learning complex goals with iterated amplification
We’re proposing an AI safety technique called iterated amplification that lets us specify complicated behaviors and goals that are beyond human scale, by demonstrating how to decompose a task into sim
Iterated amplification is a method for generating a training signal for the latter types of tasks, under certain assumptions. Namely, although a human can’t perform or judge the whole task directly, we assume that a human can, given a piece of the task, identify clear smaller components of which it’s made up. For example, in the networked computer example, a human could break down “defend a collection of servers and routers” into “consider attacks on the servers”, “consider attacks on the routers”, and “consider how the previous two attacks might interact”. Additionally, we assume a human can do very small instances of the task, for example “identify if a specific line in a log file is suspicious”. If these two things hold true, then we can build up a training signal for big tasks from human training signals for small tasks, using the human to coordinate their assembly.
In our implementation of amplification, we start by sampling small subtasks and training the AI system to do them by soliciting demonstrations from humans (who can do these small tasks). We then begin sampling slightly larger tasks, solving them by asking humans to break them up into small pieces, which AI systems trained from the previous step can now solve. We use the solutions to these slightly harder tasks, which were obtained with human help, as a training signal to train AI systems to solve these second-level tasks directly (without human help). We then continue to further composite tasks, iteratively building up a training signal as we go. If the process works, the end result is a totally automated system that can solve highly composite tasks despite starting with no direct training signal for those tasks. This process is somewhat similar to expert iteration(opens in a new window) (the method used in AlphaGo Zero(opens in a new window)), except that expert iteration reinforces an existing training signal, while iterated amplification builds up a training signal from scratch. It also has features in common with several(opens in a new window) recent(opens in a new window) learning algorithms(opens in a new window) that use problem decomposition on-the-fly to solve a problem at test time, but differs in that it operates in settings where there is no prior training signal.