Solving (some) formal math olympiad problems
We built a neural theorem prover for Lean that learned to solve a variety of challenging high-school olympiad problems, including problems from the AMC12 and AIME competitions, as well as two problems
We achieved a new state-of-the-art (41.2% vs 29.3%) on the miniF2F(opens in a new window) benchmark, a challenging collection of high-school olympiad problems. Our approach, which we call statement curriculum learning, consists of manually collecting a set of statements of varying difficulty levels (without proof) where the hardest statements are similar to the benchmark we target. Initially our neural prover is weak and can only prove a few of them. We iteratively search for new proofs and re-train our neural network on the newly discovered proofs, and after 8 iterations, our prover ends up being vastly superior when tested on miniF2F.
Formal mathematics is an exciting domain to study because of (i) its richness, letting you prove arbitrary theorems which require reasoning, creativity and insight and (ii) its similarity to games—where AI has been spectacularly successful—in that it has an automated way of determining whether a proof is successful (i.e., verified by the formal system). As demonstrated in the trivial example below, proving a formal statement requires generating a sequence of proof steps, each proof step consisting in a call to a tactic.B
These tactics take mathematical terms as arguments and each tactic call will transform the current statement to prove, into statements that are easier to prove, until nothing is left to prove.
Footnotes
- A
These problems are not standard math exercises, they are used to let the best high-school students from the US (AMC12, AIME) or the world (IMO) compete against each other.
- B
The artifacts accepted by the formal system are low-level (like assembly code) and hard for humans to produce. Tactics are search procedures that generate such artifacts from higher level directives to assist formalization.
Authors
Stanislas Polu, Jesse Michael Han, Ilya Sutskever
Acknowledgments
Thanks to our paper co-authors: Igor Babuschkin, Kunhao Zheng and Mantas Baksys.
Thanks to the students of the Xena Project Discord who helped us formalize proofs and statements (in particular: Antoine Labelle, Hanting Zhang, Shing Tak Lam, Paul Lezeau, Sara Diaz, Nikita Golikov, Yael Dillies, Artem Vasilyev, Ollie Perree, and Yourong Zang).
Thanks in particular to Kevin Buzzard and Daniel Selsam for their support and thoughtful feedback since the very beginning of this project.