Home / Chroniques / How AI is emerging as a teammate for mathematicians
Généré par l'IA / Generated using AI
π Digital π Science and technology

How AI is emerging as a teammate for mathematicians

Amaury Hayat_VF
Amaury Hayat
Professor at École Nationale des Ponts et Chaussées (IP Paris) and Specialist in AI for Mathematics
Key takeaways
  • AI models produce reasoning comparable to that of a first-year PhD student not yet exceeding the level of human mathematicians.
  • Numerous obstacles in AI methods still hinder research in fields with limited data.
  • Although, without providing complete solutions, AI can serve as an assistant in several different fields.
  • For example, Math_inc's Gauss formalised the Strong Prime Number Theorem (PNT) in Lean, with 25,000 lines of code and over 1,000 theorems and definitions.
  • In the short term, researchers will use platforms to generate and verify proofs, while in the long term, systems could combine automatic generation and formal verification

Recent advances in auto­ma­ted rea­so­ning have pla­ced arti­fi­cial intel­li­gence at the heart of new stu­dies in mathe­ma­tics. Sys­tems such as Goe­del-Pro­ver, desi­gned to gene­rate and vali­date for­mal proofs, sho­wed signi­fi­cant pro­gress in 2025, achie­ving high levels of per­for­mance on spe­cia­li­sed test sets1. At the same time, APOLLO high­ligh­ted the abi­li­ty of a model to col­la­bo­rate with the Lean proof assis­tant to pro­duce veri­fiable demons­tra­tions, while auto­ma­ti­cal­ly revi­sing cer­tain erro­neous steps2.

The chal­lenges asso­cia­ted with these advances were exa­mi­ned at the AI and Mathe­ma­tics confe­rence orga­ni­sed in April 2025 by Hi ! PARIS, which high­ligh­ted the gro­wing role of for­mal veri­fi­ca­tion in cur­rent research3. This work is consistent with cer­tain ana­lyses publi­shed in the scien­ti­fic press. In 2024, Le Monde high­ligh­ted the poten­tial of these tools to acce­le­rate the pro­duc­tion of results, refer­ring to the exten­sive use of models to explore ave­nues that are dif­fi­cult to access using tra­di­tio­nal approaches4. For its part, Science & Vie repor­ted in 2025 that cer­tain sys­tems had achie­ved high scores in Olym­piad-type com­pe­ti­tions, demons­tra­ting their abi­li­ty to deal with struc­tu­red pro­blems5, even if these sce­na­rios only par­tial­ly reflect the requi­re­ments of mathe­ma­ti­cal research.

To define these dyna­mics while sepa­ra­ting fact from fic­tion, Amau­ry Hayat, pro­fes­sor at École des Ponts Paris­Tech (IP Paris) and resear­cher in applied mathe­ma­tics and arti­fi­cial intel­li­gence, shares his exper­tise. He heads the DESCARTES pro­ject, dedi­ca­ted to lear­ning methods in contexts where data is scarce, as well as seve­ral pro­jects on rein­for­ce­ment lear­ning applied to auto­ma­tic proof. His ana­lyses exa­mine cur­rent deve­lop­ments in auto­ma­ted rea­so­ning in mathe­ma­tics, dra­wing on publi­shed data, tools and results.

#1 AI can solve all open mathematical research problems without human intervention : FALSE

Amau­ry Hayat. Arti­fi­cial intel­li­gence, par­ti­cu­lar­ly models capable of rea­so­ning in natu­ral or for­mal lan­guage, can alrea­dy pro­duce evi­dence at the level of a first-year PhD student in cer­tain cases. There are simple situa­tions that it can­not yet handle, but there are also com­plex pro­blems that it can solve.

We do not yet have sys­tems capable of gene­ra­ting, from start to finish, proofs that sur­pass the level of human mathe­ma­ti­cians. Howe­ver, AI can alrea­dy pro­vide very inter­es­ting proofs and, in some cases, solve pro­blems that have remai­ned unsol­ved for a long time. In these cases, they often exploit frag­ments of solu­tions scat­te­red across dif­ferent articles or fields, which AI is able to com­bine effec­ti­ve­ly thanks to its abi­li­ty to access and orga­nise the know­ledge avai­lable on the inter­net. For example (Figure 1), Goe­del­Pro­ver-v2, a for­mal model, was trai­ned on 1.64 mil­lion for­ma­li­sed sta­te­ments and, in 2025, achie­ved an accu­ra­cy of 88.1% without self-cor­rec­tion and 90.4% with self-cor­rec­tion on the MiniF2F bench­mark6

Figure 1 : Com­pa­ra­tive repre­sen­ta­tion of the dif­ferent Goe­del-Pro­ver and Deep­seek-Pro­ver models based on per­for­mance obtai­ned on miniF2F, Put­nam­Bench, and MathOlympiadBench

#2 The scarcity of formalised data significantly hinders the training of AI models : TRUE

The main obs­tacles are more rela­ted to AI methods than to mathe­ma­tics itself. In high­ly spe­cia­li­sed fields of research, data is extre­me­ly limi­ted. Howe­ver, with cur­rent AI trai­ning methods, this scar­ci­ty is a major obs­tacle. Rein­for­ce­ment lear­ning can theo­re­ti­cal­ly over­come this pro­blem, but it still requires a signi­fi­cant num­ber of examples to be effec­tive and is par­ti­cu­lar­ly rele­vant when auto­ma­tic veri­fi­ca­tion is possible.

Today, this veri­fi­ca­tion can be done either in natu­ral lan­guage, which limits the types of pro­blems that can be addres­sed, or in for­mal lan­guage, where a com­pu­ter can direct­ly confirm whe­ther a proof is cor­rect. The chal­lenge is that cur­rent AI sys­tems that express them­selves in for­mal lan­guage are not yet at the level of AI sys­tems that express them­selves in natu­ral language.

Two ave­nues are being explo­red to make pro­gress : increa­sing the amount of data by auto­ma­ti­cal­ly trans­la­ting natu­ral lan­guage into for­mal lan­guage and impro­ving rein­for­ce­ment lear­ning methods. This approach com­bines the power of natu­ral lan­guage models with the rigour of for­mal veri­fi­ca­tion, pro­vi­ding auto­ma­tic feed­back for trial-and-error training.

Godel-Pro­ver is inter­es­ting because it qui­ck­ly achie­ved the per­for­mance of our model on the Put­nam­Bench data­set, ini­tial­ly obtai­ned with a very limi­ted num­ber of examples. Sub­sequent ver­sions, as well as Deep­Seek-Pro­ver, are extre­me­ly power­ful, par­ti­cu­lar­ly for Olym­piad-type exercises. 

Our objec­tive with DESCARTES is dif­ferent : we are tar­ge­ting research-level pro­blems, cha­rac­te­ri­sed by a low data regime. The chal­lenge is the­re­fore to deve­lop models capable of wor­king under these condi­tions, unlike tra­di­tio­nal methods that exploit very large quan­ti­ties of examples.

#3 AI can serve as an effective assistant in scientific collaboration : TRUE

For most of the pro­blems I test, even the best models do not pro­vide a com­plete solu­tion. Howe­ver, AI does pro­vide inter­es­ting ans­wers on a fair­ly regu­lar basis. Although it is not as reliable as consul­ting an expert col­league, the advan­tage is that you get a single inter­face capable of hand­ling mul­tiple domains simul­ta­neous­ly, which saves a lot of time. Some­times, it is almost like having a high­ly expe­rien­ced col­league on hand to assess what is fea­sible. For example, Math_inc’s Gauss for­ma­li­sed the Strong Prime Num­ber Theo­rem (PNT) in Lean, pro­du­cing approxi­ma­te­ly 25,000 lines of code and over 1,000 theo­rems and defi­ni­tions7, while Pro­ver Agent uses a feed­back loop to gene­rate auxi­lia­ry lem­mas and com­plete proofs8. These models signi­fi­cant­ly reduce the time spent on explo­ra­tion and veri­fi­ca­tion, while enhan­cing the rigour of the results.

#4 Advanced computing systems truly understand mathematical reasoning when they produce proofs : FALSE

For the DESCARTES pro­ject, we work direct­ly in for­mal lan­guage, which allows the com­pu­ter to veri­fy the cor­rect­ness of proofs. Howe­ver, when I use tra­di­tio­nal LLMs for mathe­ma­ti­cal articles out­side of this pro­ject, I have to read and veri­fy the proofs manual­ly. This saves a lot of time, but there is still a risk : the errors pro­du­ced by the models are some­times sur­pri­sing, unex­pec­ted, and do not cor­res­pond to those that a human would make. They can some­times appear very cre­dible while being false, which requires par­ti­cu­lar vigilance.

#5 AI risks completely replacing mathematics education : FALSE

There are two dis­tinct levels in mathe­ma­tics edu­ca­tion : trai­ning up to mas­ter’s level and post-mas­ter’s research. When it comes to lear­ning the basics and unders­tan­ding proofs, AI’s abi­li­ty to prove theo­rems does not fun­da­men­tal­ly chal­lenge tra­di­tio­nal tea­ching methods. Human trai­ning in repro­du­cing proofs remains essen­tial for deve­lo­ping rea­so­ning skills. 

On the other hand, in research pro­jects or exa­mi­na­tions where stu­dents have access to all resources, AI can play the role of a per­so­nal assis­tant. For example, in a Mas­ter’s pro­ject, it could help pro­duce evi­dence on open pro­blems, enri­ching the lear­ning expe­rience and crea­ti­vi­ty of students.

#6 Some advanced digital systems can generate almost-complete scientific articles, but their reliability remains variable : UNCERTAIN

In the short term, resear­chers will use plat­forms such as ChatGPT, Gemi­ni or Mis­tral to gene­rate and veri­fy evi­dence qui­ck­ly, saving consi­de­rable time. Spe­cia­li­sed tools are also begin­ning to emerge, such as Google Labs for explo­ring Google Scho­lar or Alpha Evolve for construc­ting mathe­ma­ti­cal struc­tures, while Gemini‑3, Deep Think and GPT-5-pro (lan­guage models) first gene­rate a draft and then use veri­fi­ca­tion tools to iden­ti­fy any errors, based on the avai­lable information.

In the medium to long term, sys­tems could com­bine auto­ma­tic gene­ra­tion and for­mal veri­fi­ca­tion, acce­le­ra­ting research and influen­cing publi­ca­tion pat­terns and aca­de­mic expec­ta­tions. Cur­rent models are alrea­dy capable of pro­du­cing near­ly com­plete articles based on pre­vious publi­ca­tions, which could trans­form the pro­duc­tion of ‘incre­men­tal’ articles and free up time for more ambi­tious research. For example, com­pa­ra­tive tests show that on the miniF2F bench­mark, APOLLO achieves 75% suc­cess for auto­ma­ti­cal­ly gene­ra­ted proofs, while Goe­del-Pro­ver achieves 57.6%9. Howe­ver, some models can pro­duce subtle concep­tual errors or repe­ti­tive pas­sages, making human super­vi­sion essen­tial to vali­date scien­ti­fic reliability.

Cer­tain aspects of resear­chers’ careers will cer­tain­ly need to be rethought, inclu­ding the publi­ca­tion sys­tem, peer review and article wri­ting. Models are now capable of pro­du­cing high-qua­li­ty proofs and articles, and this could change the nature and fre­quen­cy of scien­ti­fic publications.

Interview by Aicha Fall
1Lin, Y. et al. 2025. Goe­del-Pro­ver : A Fron­tier Model for Open-Source Auto­ma­ted Theo­rem Pro­ving. arXiv. https://​arxiv​.org/​a​b​s​/​2​5​0​2​.​07640
2Ospa­nov, A., You­sef­za­deh, R. 2025. APOLLO : Auto­ma­ted LLM and Lean Col­la­bo­ra­tion for Advan­ced For­mal Rea­so­ning. arXiv. https://​arxiv​.org/​a​b​s​/​2​5​0​5​.​05758
3PEPR IA. 2025. “Retour sur l’événement IA et Mathé­ma­tiques”. Hi ! PARIS. https://​www​.pepr​-ia​.fr/​2​0​2​5​/​0​4​/​2​5​/​r​e​t​o​u​r​-​s​u​r​-​l​e​v​e​n​e​m​e​n​t​-​i​a​-​m​a​t​h​e​m​a​t​i​q​u​e​s​-​u​n​e​-​t​h​e​m​a​t​i​q​u​e​-​a​u​-​c​o​e​u​r​-​d​e​-​l​a​-​r​e​c​h​e​rche/
4Bar­ral, J. 2024. “L’IA va per­mettre d’accélérer la recherche scien­ti­fique, bien plus qu’on ne peut l’imaginer”. Le Monde.https://www.lemonde.fr/economie/article/2024/10/12/l‑ia-va-permettre-d-accelerer-la-recherche-scientifique-bien-plus-qu-on-ne-peut-l-imaginer_6349704_3234.html
5Polge, A. 2025. “Des pro­diges des maths n’ont rien pu faire face à une IA aux Olym­piades inter­na­tio­nales”. Science et Vie. https://​www​.science​-et​-vie​.com/​s​c​i​e​n​c​e​s​-​f​o​n​d​a​m​e​n​t​a​l​e​s​/​m​a​t​h​e​m​a​t​i​q​u​e​s​/​d​e​s​-​p​r​o​d​i​g​e​s​-​d​e​s​-​m​a​t​h​s​-​n​o​n​t​-​r​i​e​n​-​p​u​-​f​a​i​r​e​-​f​a​c​e​-​a​-​u​n​e​-​i​a​-​a​u​x​-​o​l​y​m​p​i​a​d​e​s​-​i​n​t​e​r​n​a​t​i​o​n​a​l​e​s​-​d​e​-​m​a​t​h​e​m​a​t​i​q​u​e​s​-​2​1​8​3​6​3​.html
6Lin, Y., Tang, S., Lyu, B., Wu, J., Lin, H., Yang, K., Li, J., Xia, M., Chen, D., Aro­ra, S., & Jin, C. (2025). Goe­del Pro­ver : A Fron­tier Model for Open Source Auto­ma­ted Theo­rem Pro­ving. arXiv. https://​arxiv​.org/​a​b​s​/​2​5​0​2​.​07640
7Math, Inc. S. d. Intro­du­cing Gauss, an agent for auto­for­ma­li­za­tion. Math, Inc. https://​www​.math​.inc/​gauss
8Chen, L., Gu, J., Huang, L., Huang, W., Jiang, Z., Jie, A., Jin, X., Ren, C., Shen, J., Wu, Y., Xia, Y., Wu, H., Ying, H., Yuan, H., Wei, C., & Xin, H. (2025). Pro­ver Agent : An Agent-based Fra­me­work for For­mal Mathe­ma­ti­cal Proofs. arXiv. https://​arxiv​.org/​a​b​s​/​2​5​0​6​.​19923
9Ou, Linyu, and YuYang Yin. 2025. Empo­we­ring Light­weight MLLMs with Rea­so­ning via Long CoT SFT. arXiv pre­print arXiv:2509.03321. https://​arxiv​.org/​a​b​s​/​2​5​0​9​.​03321

Support accurate information rooted in the scientific method.

Donate