Home / Chroniques / How AI is emerging as a teammate for mathematicians
Généré par l'IA / Generated using AI
π Digital π Science and technology

How AI is emerging as a teammate for mathematicians

Amaury Hayat_VF
Amaury Hayat
Professor at École Nationale des Ponts et Chaussées (IP Paris) and Specialist in AI for Mathematics
Key takeaways
  • AI models produce reasoning comparable to that of a first-year PhD student not yet exceeding the level of human mathematicians.
  • Numerous obstacles in AI methods still hinder research in fields with limited data.
  • Although, without providing complete solutions, AI can serve as an assistant in several different fields.
  • For example, Math_inc's Gauss formalised the Strong Prime Number Theorem (PNT) in Lean, with 25,000 lines of code and over 1,000 theorems and definitions.
  • In the short term, researchers will use platforms to generate and verify proofs, while in the long term, systems could combine automatic generation and formal verification

Recent advances in auto­mated reas­on­ing have placed arti­fi­cial intel­li­gence at the heart of new stud­ies in math­em­at­ics. Sys­tems such as Goedel-Prover, designed to gen­er­ate and val­id­ate form­al proofs, showed sig­ni­fic­ant pro­gress in 2025, achiev­ing high levels of per­form­ance on spe­cial­ised test sets1. At the same time, APOLLO high­lighted the abil­ity of a mod­el to col­lab­or­ate with the Lean proof assist­ant to pro­duce veri­fi­able demon­stra­tions, while auto­mat­ic­ally revis­ing cer­tain erro­neous steps2.

The chal­lenges asso­ci­ated with these advances were examined at the AI and Math­em­at­ics con­fer­ence organ­ised in April 2025 by Hi! PARIS, which high­lighted the grow­ing role of form­al veri­fic­a­tion in cur­rent research3. This work is con­sist­ent with cer­tain ana­lyses pub­lished in the sci­entif­ic press. In 2024, Le Monde high­lighted the poten­tial of these tools to accel­er­ate the pro­duc­tion of res­ults, refer­ring to the extens­ive use of mod­els to explore aven­ues that are dif­fi­cult to access using tra­di­tion­al approaches4. For its part, Sci­ence & Vie repor­ted in 2025 that cer­tain sys­tems had achieved high scores in Olympi­ad-type com­pet­i­tions, demon­strat­ing their abil­ity to deal with struc­tured prob­lems5, even if these scen­ari­os only par­tially reflect the require­ments of math­em­at­ic­al research.

To define these dynam­ics while sep­ar­at­ing fact from fic­tion, Amaury Hay­at, pro­fess­or at École des Ponts Par­isTech (IP Par­is) and research­er in applied math­em­at­ics and arti­fi­cial intel­li­gence, shares his expert­ise. He heads the DESCARTES pro­ject, ded­ic­ated to learn­ing meth­ods in con­texts where data is scarce, as well as sev­er­al pro­jects on rein­force­ment learn­ing applied to auto­mat­ic proof. His ana­lyses exam­ine cur­rent devel­op­ments in auto­mated reas­on­ing in math­em­at­ics, draw­ing on pub­lished data, tools and results.

#1 AI can solve all open mathematical research problems without human intervention: FALSE

Amaury Hay­at. Arti­fi­cial intel­li­gence, par­tic­u­larly mod­els cap­able of reas­on­ing in nat­ur­al or form­al lan­guage, can already pro­duce evid­ence at the level of a first-year PhD stu­dent in cer­tain cases. There are simple situ­ations that it can­not yet handle, but there are also com­plex prob­lems that it can solve.

We do not yet have sys­tems cap­able of gen­er­at­ing, from start to fin­ish, proofs that sur­pass the level of human math­em­aticians. How­ever, AI can already provide very inter­est­ing proofs and, in some cases, solve prob­lems that have remained unsolved for a long time. In these cases, they often exploit frag­ments of solu­tions scattered across dif­fer­ent art­icles or fields, which AI is able to com­bine effect­ively thanks to its abil­ity to access and organ­ise the know­ledge avail­able on the inter­net. For example (Fig­ure 1), Goedel­Prover-v2, a form­al mod­el, was trained on 1.64 mil­lion form­al­ised state­ments and, in 2025, achieved an accur­acy of 88.1% without self-cor­rec­tion and 90.4% with self-cor­rec­tion on the MiniF2F bench­mark6

Fig­ure 1: Com­par­at­ive rep­res­ent­a­tion of the dif­fer­ent Goedel-Prover and Deep­seek-Prover mod­els based on per­form­ance obtained on miniF2F, Put­nam­Bench, and MathOlympiadBench

#2 The scarcity of formalised data significantly hinders the training of AI models: TRUE

The main obstacles are more related to AI meth­ods than to math­em­at­ics itself. In highly spe­cial­ised fields of research, data is extremely lim­ited. How­ever, with cur­rent AI train­ing meth­ods, this scarcity is a major obstacle. Rein­force­ment learn­ing can the­or­et­ic­ally over­come this prob­lem, but it still requires a sig­ni­fic­ant num­ber of examples to be effect­ive and is par­tic­u­larly rel­ev­ant when auto­mat­ic veri­fic­a­tion is possible.

Today, this veri­fic­a­tion can be done either in nat­ur­al lan­guage, which lim­its the types of prob­lems that can be addressed, or in form­al lan­guage, where a com­puter can dir­ectly con­firm wheth­er a proof is cor­rect. The chal­lenge is that cur­rent AI sys­tems that express them­selves in form­al lan­guage are not yet at the level of AI sys­tems that express them­selves in nat­ur­al language.

Two aven­ues are being explored to make pro­gress: increas­ing the amount of data by auto­mat­ic­ally trans­lat­ing nat­ur­al lan­guage into form­al lan­guage and improv­ing rein­force­ment learn­ing meth­ods. This approach com­bines the power of nat­ur­al lan­guage mod­els with the rigour of form­al veri­fic­a­tion, provid­ing auto­mat­ic feed­back for tri­al-and-error training.

Godel-Prover is inter­est­ing because it quickly achieved the per­form­ance of our mod­el on the Put­nam­Bench data­set, ini­tially obtained with a very lim­ited num­ber of examples. Sub­sequent ver­sions, as well as Deep­Seek-Prover, are extremely power­ful, par­tic­u­larly for Olympi­ad-type exercises. 

Our object­ive with DESCARTES is dif­fer­ent: we are tar­get­ing research-level prob­lems, char­ac­ter­ised by a low data regime. The chal­lenge is there­fore to devel­op mod­els cap­able of work­ing under these con­di­tions, unlike tra­di­tion­al meth­ods that exploit very large quant­it­ies of examples.

#3 AI can serve as an effective assistant in scientific collaboration: TRUE

For most of the prob­lems I test, even the best mod­els do not provide a com­plete solu­tion. How­ever, AI does provide inter­est­ing answers on a fairly reg­u­lar basis. Although it is not as reli­able as con­sult­ing an expert col­league, the advant­age is that you get a single inter­face cap­able of hand­ling mul­tiple domains sim­ul­tan­eously, which saves a lot of time. Some­times, it is almost like hav­ing a highly exper­i­enced col­league on hand to assess what is feas­ible. For example, Math_inc’s Gauss form­al­ised the Strong Prime Num­ber The­or­em (PNT) in Lean, pro­du­cing approx­im­ately 25,000 lines of code and over 1,000 the­or­ems and defin­i­tions7, while Prover Agent uses a feed­back loop to gen­er­ate aux­il­i­ary lem­mas and com­plete proofs8. These mod­els sig­ni­fic­antly reduce the time spent on explor­a­tion and veri­fic­a­tion, while enhan­cing the rigour of the results.

#4 Advanced computing systems truly understand mathematical reasoning when they produce proofs: FALSE

For the DESCARTES pro­ject, we work dir­ectly in form­al lan­guage, which allows the com­puter to veri­fy the cor­rect­ness of proofs. How­ever, when I use tra­di­tion­al LLMs for math­em­at­ic­al art­icles out­side of this pro­ject, I have to read and veri­fy the proofs manu­ally. This saves a lot of time, but there is still a risk: the errors pro­duced by the mod­els are some­times sur­pris­ing, unex­pec­ted, and do not cor­res­pond to those that a human would make. They can some­times appear very cred­ible while being false, which requires par­tic­u­lar vigilance.

#5 AI risks completely replacing mathematics education: FALSE

There are two dis­tinct levels in math­em­at­ics edu­ca­tion: train­ing up to mas­ter­’s level and post-mas­ter­’s research. When it comes to learn­ing the basics and under­stand­ing proofs, AI’s abil­ity to prove the­or­ems does not fun­da­ment­ally chal­lenge tra­di­tion­al teach­ing meth­ods. Human train­ing in repro­du­cing proofs remains essen­tial for devel­op­ing reas­on­ing skills. 

On the oth­er hand, in research pro­jects or exam­in­a­tions where stu­dents have access to all resources, AI can play the role of a per­son­al assist­ant. For example, in a Mas­ter­’s pro­ject, it could help pro­duce evid­ence on open prob­lems, enrich­ing the learn­ing exper­i­ence and cre­ativ­ity of students.

#6 Some advanced digital systems can generate almost-complete scientific articles, but their reliability remains variable: UNCERTAIN

In the short term, research­ers will use plat­forms such as Chat­G­PT, Gem­ini or Mis­tral to gen­er­ate and veri­fy evid­ence quickly, sav­ing con­sid­er­able time. Spe­cial­ised tools are also begin­ning to emerge, such as Google Labs for explor­ing Google Schol­ar or Alpha Evolve for con­struct­ing math­em­at­ic­al struc­tures, while Gemini‑3, Deep Think and GPT-5-pro (lan­guage mod­els) first gen­er­ate a draft and then use veri­fic­a­tion tools to identi­fy any errors, based on the avail­able information.

In the medi­um to long term, sys­tems could com­bine auto­mat­ic gen­er­a­tion and form­al veri­fic­a­tion, accel­er­at­ing research and influ­en­cing pub­lic­a­tion pat­terns and aca­dem­ic expect­a­tions. Cur­rent mod­els are already cap­able of pro­du­cing nearly com­plete art­icles based on pre­vi­ous pub­lic­a­tions, which could trans­form the pro­duc­tion of ‘incre­ment­al’ art­icles and free up time for more ambi­tious research. For example, com­par­at­ive tests show that on the miniF2F bench­mark, APOLLO achieves 75% suc­cess for auto­mat­ic­ally gen­er­ated proofs, while Goedel-Prover achieves 57.6%9. How­ever, some mod­els can pro­duce subtle con­cep­tu­al errors or repet­it­ive pas­sages, mak­ing human super­vi­sion essen­tial to val­id­ate sci­entif­ic reliability.

Cer­tain aspects of research­ers’ careers will cer­tainly need to be rethought, includ­ing the pub­lic­a­tion sys­tem, peer review and art­icle writ­ing. Mod­els are now cap­able of pro­du­cing high-qual­ity proofs and art­icles, and this could change the nature and fre­quency of sci­entif­ic publications.

Interview by Aicha Fall
1Lin, Y. et al. 2025. Goedel-Prover : A Fron­ti­er Mod­el for Open-Source Auto­mated The­or­em Prov­ing. arX­iv. https://​arx​iv​.org/​a​b​s​/​2​5​0​2​.​07640
2Ospan­ov, A., Youse­fz­a­deh, R. 2025. APOLLO: Auto­mated LLM and Lean Col­lab­or­a­tion for Advanced Form­al Reas­on­ing. arX­iv. https://​arx​iv​.org/​a​b​s​/​2​5​0​5​.​05758
3PEPR IA. 2025. “Retour sur l’événement IA et Math­ématiques”. Hi! PARIS. https://​www​.pepr​-ia​.fr/​2​0​2​5​/​0​4​/​2​5​/​r​e​t​o​u​r​-​s​u​r​-​l​e​v​e​n​e​m​e​n​t​-​i​a​-​m​a​t​h​e​m​a​t​i​q​u​e​s​-​u​n​e​-​t​h​e​m​a​t​i​q​u​e​-​a​u​-​c​o​e​u​r​-​d​e​-​l​a​-​r​e​c​h​e​rche/
4Bar­ral, J. 2024. “L’IA va per­mettre d’accélérer la recher­che sci­en­ti­fique, bien plus qu’on ne peut l’imaginer”. Le Monde.https://www.lemonde.fr/economie/article/2024/10/12/l‑ia-va-permettre-d-accelerer-la-recherche-scientifique-bien-plus-qu-on-ne-peut-l-imaginer_6349704_3234.html
5Polge, A. 2025. “Des prodiges des maths n’ont rien pu faire face à une IA aux Olympi­ades inter­na­tionales”. Sci­ence et Vie. https://​www​.sci​ence​-et​-vie​.com/​s​c​i​e​n​c​e​s​-​f​o​n​d​a​m​e​n​t​a​l​e​s​/​m​a​t​h​e​m​a​t​i​q​u​e​s​/​d​e​s​-​p​r​o​d​i​g​e​s​-​d​e​s​-​m​a​t​h​s​-​n​o​n​t​-​r​i​e​n​-​p​u​-​f​a​i​r​e​-​f​a​c​e​-​a​-​u​n​e​-​i​a​-​a​u​x​-​o​l​y​m​p​i​a​d​e​s​-​i​n​t​e​r​n​a​t​i​o​n​a​l​e​s​-​d​e​-​m​a​t​h​e​m​a​t​i​q​u​e​s​-​2​1​8​3​6​3​.html
6Lin, Y., Tang, S., Lyu, B., Wu, J., Lin, H., Yang, K., Li, J., Xia, M., Chen, D., Arora, S., & Jin, C. (2025). Goedel Prover : A Fron­ti­er Mod­el for Open Source Auto­mated The­or­em Prov­ing. arX­iv. https://​arx​iv​.org/​a​b​s​/​2​5​0​2​.​07640
7Math, Inc. S. d. Intro­du­cing Gauss, an agent for auto­form­al­iz­a­tion. Math, Inc. https://​www​.math​.inc/​gauss
8Chen, L., Gu, J., Huang, L., Huang, W., Jiang, Z., Jie, A., Jin, X., Ren, C., Shen, J., Wu, Y., Xia, Y., Wu, H., Ying, H., Yuan, H., Wei, C., & Xin, H. (2025). Prover Agent : An Agent-based Frame­work for Form­al Math­em­at­ic­al Proofs. arX­iv. https://​arx​iv​.org/​a​b​s​/​2​5​0​6​.​19923
9Ou, Linyu, and YuYang Yin. 2025. Empower­ing Light­weight MLLMs with Reas­on­ing via Long CoT SFT. arX­iv pre­print arXiv:2509.03321. https://​arx​iv​.org/​a​b​s​/​2​5​0​9​.​03321

Support accurate information rooted in the scientific method.

Donate