At the 2024 International Mathematical Olympiad (IMO), one competitor did so well that it would have been awarded the Silver Prize, except for one thing: it was an AI system. This was the first time AI had achieved a medal-level performance in the competition’s history. In a paper published in the journal Nature, researchers detail the technology behind this remarkable achievement.
The AI is AlphaProof, a sophisticated program developed by Google DeepMind that learns to solve complex mathematical problems. The achievement at the IMO was impressive enough, but what really makes AlphaProof special is its ability to find and correct errors. While large language models (LLMs) can solve math problems, they often can’t guarantee the accuracy of their solutions. There may be hidden flaws in their reasoning.
AlphaProof is different because its answers are always 100% correct. That’s because it uses a specialized software environment called Lean (originally developed by Microsoft Research) that acts like a strict teacher verifying every logical step. This means the computer itself verifies answers, so its conclusions are trustworthy.
Three-stage training process
Training this powerful system to reason at an elite level involved three different training stages. First, the researchers exposed AlphaProof to about 300 billion tokens of general code and mathematical text to give it a broad understanding of concepts such as logic, mathematical language, and programming structure. Next, it was given 300,000 math proofs written by experts that were already in the Lean environment.
The final stage was where the system learned to solve problems on its own. It was given a massive homework task of 80 million formal math problems to solve. Using Reinforcement Learning (RL), which is based on trial and error, AlphaProof was rewarded for every successful proof. By tackling math problems on such a massive scale, the system taught itself new and complex reasoning strategies that went beyond copying human examples.
For the toughest problems, AlphaProof used a technique the researchers developed called Test-Time RL (TTRL), which creates and solves millions of simplified versions of the target problem until it finds a solution.
“Our work demonstrates that learning at scale from grounded experience produces agents with complex mathematical reasoning strategies, paving the way for a reliable AI tool in complex mathematical problem-solving,” wrote the researchers in their paper.
In addition to solving seemingly intractable math problems, AlphaProof could also be employed by mathematicians to correct their work and help them develop new theories.
Written for you by our author Paul Arnold, edited by Gaby Clark, and fact-checked and reviewed by Robert Egan—this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive.
If this reporting matters to you,
please consider a donation (especially monthly).
You’ll get an ad-free account as a thank-you.
More information:
Thomas Hubert et al, Olympiad-level formal mathematical reasoning with reinforcement learning, Nature (2025). DOI: 10.1038/s41586-025-09833-y
© 2025 Science X Network
Citation:
AI math genius delivers 100% accurate results (2025, November 14)
retrieved 14 November 2025
from https://phys.org/news/2025-11-ai-math-genius-accurate-results.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.


