Summarizing books with human feedback

To safely deploy powerful, general-purpose artificial intelligence in the future, we need to ensure that machine learning models act in accordance with human intentions. This challenge has become known as the alignment problem.

A scalable solution to the alignment problem needs to work on tasks where model outputs are difficult or time-consuming for humans to evaluate. To test scalable alignment techniques, we trained a model to summarize entire books, as shown in the following samples.A Our model works by first summarizing small sections of a book, then summarizing those summaries into a higher-level summary, and so on.

Footnotes

These samples were selected from works in the public domain⁠(opens in a new window), and are part of GPT-3′s pretraining data. To control for this effect, and purely for research purposes, our paper⁠(opens in a new window) evaluates summaries of books the model has never seen before.

We’ve amended our original claim about results on NarrativeQA after being made aware of prior work with better results than ours.

Authors

Jeffrey Wu, Ryan Lowe, Jan Leike

Acknowledgments

We’d like to acknowledge our paper co-authors: Long Ouyang, Daniel Ziegler, Nisan Stiennon, and Paul Christiano.

Thanks to the following for feedback on this release: Steve Dowling, Hannah Wong, Miles Brundage, Gretchen Krueger, Ilya Sutskever, and Sam Altman.

Acknowledgments

Design: Justin Jay Wang

Book Cover Artwork: DALL·E⁠