AI and compute | xifan.uno

We’re releasing an analysis showing that since 2012, the amount of compute used in the largest AI training runs has been increasing exponentially with a 3.4-month doubling time (by comparison, Moore’s Law had a 2-year doubling period). Since 2012, this metric has grown by more than 300,000x (a 2-year doubling period would yield only a 7x increase). Improvements in compute have been a key component of AI progress, so as long as this trend continues, it’s worth preparing for the implications of systems far outside today’s capabilities.

We’ve updated our analysis⁠ with data that span 1959 to 2012. Looking at the data as a whole, we clearly see two distinct eras of training AI systems in terms of compute-usage: (a) a first era, from 1959 to 2012, which is defined by results that roughly track Moore’s law, and (b) the modern era, from 2012 to now, of results using computational power that substantially outpaces macro trends. The history of investment in AI broadly is usually told as a story of booms and busts, but we don’t see that reflected in the historical trend of compute used by learning systems. It seems that AI winters and periods of excitement had a small effect on compute used to train modelsB over the last half-century.

Starting from the perceptron⁠(opens in a new window) in 1959, we see a ~2-year doubling time for the compute used in these historical results—with a 3.4-month doubling time starting in ~2012. It’s difficult to draw a strong conclusion from this data alone, but we believe that this trend is probably due to a combination of the limits on the amount of compute that was possible to use for those results and the willingness to spend on scaling up experiments. C

We followed the same methodology outlined in the original post for this updated analysis. When possible, we programmatically counted the number of FLOPs in the results by implementing the models directly. Since computer architectures varied historically and many papers omitted details of their computational setup, these older data points are more uncertain (our original analysis of post-2012 data aimed to be within a factor of 2–3, but for these pre-2012 data points we aim for an order of magnitude estimate). We’ve also created graphs that provide additional views on the data: one graph lays out compute usage in fundamentals, speech, language, vision, and games over time and another visualizes the error-bar estimates around each data point.

We’re very uncertain about the future of compute usage in AI systems, but it’s difficult to be confident that the recent trend of rapid increase in compute usage will stop, and we see many reasons that the trend could continue⁠. Based on this analysis, we think policymakers should consider increasing fundingD for academic research into AI, as it’s clear that some types of AI research are becoming more computationally intensive and therefore expensive.

Compute Scaling

Footnotes

A petaflop/s-day (pfs-day) consists of performing 1015 neural net operations per second for one day, or a total of about 1020 operations. The compute-time product serves as a mental convenience, similar to kW-hr for energy. We don’t measure peak theoretical FLOPS of the hardware but instead try to estimate the number of actual operations performed. We count adds and multiplies as separate operations, we count any add or multiply as a single operation regardless of numerical precision (making “FLOP” a slight misnomer), and we ignore ensemble models⁠(opens in a new window). Example calculations that went into this graph are provided in this appendix⁠. Doubling time for line of best fit shown is 3.4 months.

Just as in the original analysis, we focus on the costs to train models. This doesn’t include AI systems like expert systems, which attracted substantial investment in the first era.

For one vivid account of the history of computing in AI in this period, see the “False Start” section in Hans Moravec’s article⁠(opens in a new window).

We’ve already advocated for additional funding for academia in our testimony in Congress⁠(opens in a new window) this year, and for the creation of dedicated compute clusters to help academia and industry collaboratively benchmark and assess the safety of AI systems in response to a request for information from NIST⁠(opens in a new window)

Original post

Dario Amodei, Danny Hernandez

Addendum

Girish Sastry, Jack Clark, Greg Brockman, Ilya Sutskever

Acknowledgments

The authors thank Katja Grace, Geoffrey Irving, Jack Clark, Thomas Anthony, and Michael Page for assistance with this post.