Home / Chroniques / The reproducibility crisis: is science deceiving us?
Généré par l'IA / Generated using AI
π Science and technology

The reproducibility crisis: is science deceiving us?

Florian Naudet_VF
Florian Naudet
Psychiatrist, Professor of Clinical Medicine at Université de Rennes, and Senior Member of the Institut Universitaire de France
Avatar
Larry Vernon Hedges
Professor of Statistics and Data Science, Education and Social Policy, Psychology, and Medical Social Sciences at Northwestern University
Avatar
Thomas Rhys Evans
Psychologist and Lecturer in Occupational Psychology and Open Research at the University of Greenwich
Key takeaways
  • In 2005, John Ioannidis demonstrated that the probability of an effect reported in a scientific article being real was significantly reduced under certain conditions such as small sample sizes.
  • Numerous studies have shown that researchers’ cognitive biases, particularly preconceptions, can influence their results.
  • Studies aimed at reproducing results (replication) can contribute to confirmation of scientific knowledge.
  • Open science and transparency initiatives remain limited and coexist with the dominant culture of scientific evaluation rather than replacing it.

The first of the ten cri­ter­ia for ‘Gold Stand­ard Sci­ence’ laid down by the Trump admin­is­tra­tion states that “to be deemed worthy of inform­ing pub­lic policy and under­pin­ning reg­u­lat­ory decisions in the United States, sci­ence should be repro­du­cible.” This require­ment comes at a time when sev­er­al large-scale stud­ies show that research­ers are strug­gling to repro­duce cer­tain res­ults estab­lished by their peers across many dis­cip­lines. How con­cerned should we be? Is it reas­on­able to expect sci­ence to always be repro­du­cible? And how can we over­come this ‘crisis’?

In 2005, an art­icle with the evoc­at­ive title “Why Most Pub­lished Research Find­ings Are False1,” pub­lished in the journ­al PLOS Medi­cine, set the cat among the pigeons. In it, John Ioan­nid­is assessed the reli­ab­il­ity of stud­ies using stat­ist­ic­al ana­lys­is to test hypo­theses; a very com­mon type of study in empir­ic­al research. Using sim­u­la­tions, the research­er showed that the prob­ab­il­ity of an effect repor­ted in a sci­entif­ic art­icle being real was sig­ni­fic­antly reduced under cer­tain con­di­tions – includ­ing small sample sizes, small observed effects, a large num­ber of hypo­theses tested, non-stand­ard­ised meth­ods, con­flicts of interest, or sig­ni­fic­ant com­pet­i­tion in the field.

Massive experimental confirmations

In the years that fol­lowed, to doc­u­ment the phe­nomen­on empir­ic­ally, the Open Sci­ence Col­lab­or­a­tion2 set out to rep­lic­ate 100 exper­i­ment­al stud­ies in psy­cho­logy3. The res­ults, pub­lished in 2015, caused a sen­sa­tion. Whilst 97% of the ori­gin­al stud­ies iden­ti­fied a ‘sig­ni­fic­ant’ effect, only 36% of the rep­lic­a­tions found one, and on aver­age of half the magnitude.

Since then, John Ioannidis’s find­ings have been con­firmed across numer­ous dis­cip­lines. A col­lab­or­a­tion between the Cen­ter for Open Sci­ence and Sci­ence Exchange has thus examined can­cer bio­logy. In the rep­lic­a­tion stud­ies, effect sizes are on aver­age 85% smal­ler than in the ori­gin­al res­ults, and only 46% of the effects are suc­cess­fully rep­lic­ated4. The same find­ings emerged from an ini­ti­at­ive cov­er­ing the social, polit­ic­al, eco­nom­ic and psy­cho­lo­gic­al sci­ences. Pub­lished in a series of art­icles in Nature5 in April 2026, its con­clu­sions show that, in around half of cases, research­ers were unable to rep­lic­ate the res­ults of the ori­gin­al studies.

The prob­lem is there­fore a deep-seated one. But does this mean we should con­clude that all sci­entif­ic know­ledge deemed defin­it­ive in these fields is sus­pi­cious? Let us be clear from the out­set: no. An effect is only con­sidered cer­tain when it has been con­firmed by mul­tiple high-qual­ity pieces of evid­ence, such as ran­dom­ised con­trolled tri­als con­duc­ted under prop­er con­di­tions or meta-ana­lyses with high stat­ist­ic­al power. “There is no reas­on to doubt the effect­ive­ness of cur­rent Cov­id vac­cines or the link between lung can­cer and smoking, to cite just two examples, as they have been iden­ti­fied in con­ver­ging research across dif­fer­ent dis­cip­lines,” com­ments Flori­an Naudet.

For example, pub­lish­ers are reluct­ant to accept stud­ies con­clud­ing that there is no sig­ni­fic­ant effect. This fil­ter deprives sci­entif­ic lit­er­at­ure of neg­at­ive res­ults that are valu­able for the robust­ness of knowledge.

Nev­er­the­less, the error rate in pub­lished stud­ies is much high­er than pre­vi­ously thought, and this war­rants atten­tion. In a sys­tem where res­ults accu­mu­late and are rap­idly aggreg­ated, the chal­lenge is as much sci­entif­ic as it is col­lect­ive. The aim is to lim­it unne­ces­sary detours in research, avoid wast­ing resources on flimsy leads, and reduce the risk that pub­lic decisions are based on flawed res­ults. Let’s recall, for example, that heads of state pro­moted the use of hydroxy­chloroquine against Cov­id-19 on the basis of ini­tial obser­va­tion­al stud­ies that were still fra­gile, before more robust clin­ic­al tri­als gradu­ally con­cluded that there was no sig­ni­fic­ant clin­ic­al bene­fit. Some of these ini­tial, highly pub­li­cised stud­ies have even been retracted.

As John Ioan­nid­is has poin­ted out, stud­ies with a low level of evid­ence (isol­ated case stud­ies, stud­ies with very small sample sizes, etc.) are the most at risk. “But meta-ana­lyses6 or ran­dom­ised tri­als7, although con­sidered much more reli­able, can also be affected by bias and errors. On homeo­pathy, for example, the meta-ana­lyses avail­able in the lit­er­at­ure do not con­verge, which should set alarm bells ringing,” explains Flori­an Naudet. A more in-depth ana­lys­is, syn­thes­ising the meta-ana­lyses and sys­tem­at­ic reviews already pub­lished, car­ried out by the Aus­trali­an Nation­al Health and Med­ic­al Research Coun­cil (NHMRC), has in fact demon­strated the fra­gil­ity of the over­whelm­ing major­ity of stud­ies con­clud­ing that homeo­pathy has an effect, and con­cluded that there is no health con­di­tion for which there is suf­fi­cient evid­ence of its effic­acy8 “In fact, there is no reas­on why a phar­ma­co­lo­gic­al effect of homeo­pathy should be iden­ti­fied,” con­tin­ues Flori­an Naudet. 

Questionable practices at fault?

But then how can errors be min­im­ised? Part of the answer lies in track­ing down ‘ques­tion­able prac­tices’— mean­ing those oper­a­tions that are not openly fraud­u­lent but are harm­ful, and which can creep into every stage of the research pro­cess, from the for­mu­la­tion of the hypo­thes­is to the inter­pret­a­tion and pub­lic­a­tion of the results.

They are intrins­ic­ally linked to the cur­rent sys­tem of reward­ing research­ers, for whom pub­lish­ing is syn­onym­ous with exist­ence. Peer-reviewed journ­al art­icles are, in fact, the show­case and bench­mark of com­pet­ence, and the pres­sure to pub­lish weighs heav­ily on the entire research sys­tem. “Prob­lems with repro­du­cib­il­ity are more closely linked to expec­ted norms and stand­ards than to delib­er­ate fraud,” asserts Thomas Rhys Evans.

For example, pub­lish­ers are reluct­ant to accept stud­ies con­clud­ing that there is no sig­ni­fic­ant effect. This fil­ter deprives sci­entif­ic lit­er­at­ure of neg­at­ive res­ults that are valu­able for the robust­ness of know­ledge. For vari­ous reas­ons, research­ers may also decide not to pub­lish their res­ults. “In 2008, for example, it was shown that one in two stud­ies on anti­de­press­ants was not pub­lished, and that, in gen­er­al, the unpub­lished ones con­cluded that there was no effect9.” The inter­pret­a­tion of data can also be exag­ger­ated to embel­lish an ambigu­ous or weak res­ult. If the find­ings lack sig­ni­fic­ance, authors may be temp­ted to adjust their research design to reveal a sig­ni­fic­ant effect (known as p‑hacking). Con­versely, an unex­pec­ted res­ult can lead to the ret­ro­spect­ive modi­fic­a­tion of ini­tial hypo­theses (HARK­ing). “The sci­entif­ic approach is meant to be hypo­thes­is-deduc­tion. But in prac­tice, we are much more induct­ive than we real­ise,” says Flori­an Naudet.

Numer­ous stud­ies have also shown that research­ers’ cog­nit­ive biases, par­tic­u­larly pre-exist­ing beliefs, can influ­ence their res­ults. This may also be the case with con­flicts of interest. “Research sug­gests, for example, that meta-ana­lyses of med­ic­al devices and phar­ma­co­lo­gic­al treat­ments10 fun­ded by industry, or stud­ies on homeo­pathy11 con­duc­ted by research­ers with con­flicts of interest, are more likely to con­clude that there are pos­it­ive effects than oth­ers,” the research­er continues.

Possible courses of action in the realm of open science

In the face of these ques­tion­able prac­tices, what safe­guards should be put in place? There is no single solu­tion. Open access to mater­i­als, data and code, as well as pre-regis­tra­tion, can improve trans­par­ency and repro­du­cib­il­ity,” explains Thomas Rhys Evans. Pre-regis­tra­tion involves declar­ing, before col­lect­ing data, what one intends to do – any sub­sequent changes to the hypo­theses or meth­od­o­logy become more vis­ible. “The pub­lic­a­tion of pro­to­cols should be man­dated by research fund­ing agen­cies as well as by sci­entif­ic journ­als for the pub­lic­a­tion of res­ults, as is already the case for clin­ic­al tri­als in over 200 of the most pres­ti­gi­ous med­ic­al journ­als,” agrees Larry Ver­non Hedges.

“These prac­tices are begin­ning to spread, and many of us believe that great­er trans­par­ency will have bene­fi­cial effects,” sum­mar­ises Flori­an Naudet. “But cur­rently these recom­mend­a­tions are based more on val­ues than on evid­ence. Will the com­munity agree to com­ply with these require­ments? Will they have the desired effect? This remains to be proven by robust stud­ies.” And, there­fore, to secure ded­ic­ated funding.

A science that must learn to replicate

A minor revolu­tion is also needed in the way we approach rep­lic­a­tion. The large-scale ini­ti­at­ives men­tioned above are invalu­able for estab­lish­ing broad-based assess­ments, but their meth­od­o­logy is not always suited to eval­u­at­ing research on a case-by-case basis. How­ever, stud­ies aimed at rep­lic­at­ing res­ults (known as rep­lic­a­tion stud­ies) can con­trib­ute to con­sol­id­at­ing sci­entif­ic know­ledge in very con­crete terms, provided they are ‘well’ designed. “Work, such as that of Harry Collins12 [Editor’s note: a Brit­ish soci­olo­gist born in 1943], sug­gests that subtle factors influ­ence rep­lic­a­tion suc­cess, and even the sci­ent­ists involved may not always know which meth­od­o­lo­gic­al details are cru­cial,” says Larry Ver­non Hedges.

Strictly speak­ing, it is not enough, to rely on a single rep­lic­a­tion study to robustly eval­u­ate a pre­vi­ous result.

Strictly speak­ing, it is not enough, in fact, to rely on a single rep­lic­a­tion study to robustly eval­u­ate a pre­vi­ous res­ult. “Nor is it enough to rely on sev­er­al rep­lic­a­tions aver­aged and com­pared with the ori­gin­al study. Sev­er­al inde­pend­ent rep­lic­a­tions are needed, and each must be com­pared with the oth­ers. The tricky part is determ­in­ing which rep­lic­a­tion stud­ies are suf­fi­ciently sim­il­ar (to the ori­gin­al study and to one anoth­er) to be rel­ev­ant, and suf­fi­ciently inde­pend­ent to provide new information.”

This is a chal­lenge giv­en that dis­cov­ery and nov­elty are highly val­ued by journ­als, to the det­ri­ment of rep­lic­a­tion. “There are also very few incent­ives and fund­ing oppor­tun­it­ies for rep­lic­a­tion stud­ies,” con­firms Flori­an Naudet.

An entire ecosystem in need of reform 

Science’s self-reflec­tion is thus gradu­ally bring­ing to light the strengths and weak­nesses of an entire sys­tem, from which pub­lish­ers, uni­ver­sit­ies, fun­ders and poli­cy­makers must not be excluded. “We increas­ingly ask research­ers to do more with less and to demon­strate impact across all aspects of their work. Yet we should ask why they are rewar­ded for the num­ber of pub­lic­a­tions rather than the qual­ity of their work, and wheth­er they truly have the train­ing, sup­port and infra­struc­ture needed to share their research effect­ively. Neces­sary changes will require action from all parts of the sys­tem — gov­ern­ments, fund­ing bod­ies, research insti­tu­tions and research sup­port staff alike,” con­cludes Thomas Rhys Evans.

Out­side the sci­entif­ic com­munity, ini­ti­at­ives are begin­ning to emerge: the EU has launched a call for pro­pos­als for the rep­lic­a­tion of stud­ies13, pub­lish­ers and fun­ders are gradu­ally tight­en­ing their trans­par­ency require­ments, and pub­lic policies, par­tic­u­larly in Europe, are pro­mot­ing open sci­ence. But these devel­op­ments remain lim­ited and coex­ist with the dom­in­ant cul­ture of sci­entif­ic eval­u­ation rather than repla­cing it. Hence, there is still a long way to go before the sys­tem can over­come its contradictions.

Anne Orliac
1Ioan­nid­is JPA (2005) Why Most Pub­lished Research Find­ings Are False. PLoS Med 2(8): e124. https://​doi​.org/​1​0​.​1​3​7​1​/​j​o​u​r​n​a​l​.​p​m​e​d​.​0​0​20124
2The Open Sci­ence Col­lab­or­a­tion describes itself as ‘a loose net­work of research­ers, pro­fes­sion­als, cit­izen sci­ent­ists, and oth­ers with an interest in open sci­ence, metas­cience, and good sci­entif­ic prac­tices’.  http://​osc​.center​foropenscience​.org/​p​a​g​e​s​/​a​b​o​u​t​.html
3Open Sci­ence Col­lab­or­a­tion, Estim­at­ing the repro­du­cib­il­ity of psy­cho­lo­gic­al sci­ence. Sci­ence 349,aac4716(2015). DOI:10.1126/science.aac4716
4The series of pub­lic­a­tions is pub­lished by eLife: https://​elifes​ci​ences​.org/​c​o​l​l​e​c​t​i​o​n​s​/​9​b​1​e​8​3​d​1​/​r​e​p​r​o​d​u​c​i​b​i​l​i​t​y​-​p​r​o​j​e​c​t​-​c​a​n​c​e​r​-​b​i​ology
5https://​www​.nature​.com/​c​o​l​l​e​c​t​i​o​n​s​/​i​d​a​j​f​ifcfg
6A meta-ana­lys­is com­bines the res­ults of sev­er­al inde­pend­ent stud­ies to obtain a more reli­able over­all estim­ate of an effect.
7A ran­dom­ised con­trolled tri­al is an exper­i­ment­al study in which par­ti­cipants are ran­domly assigned to either an inter­ven­tion group or a con­trol group, in order to reli­ably com­pare their effects.
8https://​www​.hri​-research​.org/​w​p​-​c​o​n​t​e​n​t​/​u​p​l​o​a​d​s​/​2​0​1​5​/​0​7​/​N​H​M​R​C​-​I​n​f​o​r​m​a​t​i​o​n​-​P​a​p​e​r​-​M​a​r​2​0​1​5.pdf and https://​www​.hri​-research​.org/​w​p​-​c​o​n​t​e​n​t​/​u​p​l​o​a​d​s​/​2​0​1​4​/​0​7​/​H​o​m​e​o​p​a​t​h​y​-​O​v​e​r​v​i​e​w​-​R​e​p​o​r​t.pdf
9Turn­er EH, Mat­thews AM, Linard­a­tos E, Tell RA, Rosenth­al R. Select­ive pub­lic­a­tion of anti­de­press­ant tri­als and its influ­ence on appar­ent effic­acy. N Engl J Med. 17 Jan 2008;358(3):252–60. doi: 10.1056/NEJMsa065779. PMID: 18199864.
10For example: Lundh A, Sis­mondo S, Lex­chin J, Busuioc OA, Bero L. Industry spon­sor­ship and research out­come. Cochrane Data­base Syst Rev. 12 Dec 2012;12:MR000033. doi: 10.1002/14651858.MR000033.pub2. Updated in: Cochrane Data­base Syst Rev. 16 Feb­ru­ary 2017;2:MR000033. doi: 10.1002/14651858.MR000033.pub3. PMID: 23235689.
11Per­ri­er Q, Coste A, Diallo A, Guigui A, Khouri C, Roustit M. Rela­tion­ship between con­flicts of interest and the res­ults of meta-ana­lyses of homeo­pathy tri­als. BMJ Evid Based Med. 22 Novem­ber 2023;28(6):426–427. doi: 10.1136/bmjebm-2022–112228. PMID: 37197896.
12See in par­tic­u­lar Collins HM. Rep­lic­a­tion of exper­i­ments: a soci­olo­gic­al com­ment. Beha­vi­or­al and Brain Sci­ences. 1978;1(3):391–392. doi:10.1017/S0140525X00075567
13https://​www​.hori​zon​-europe​.gouv​.fr/​p​i​l​l​a​r​-​i​v​-​a​d​v​a​n​c​i​n​g​-​k​n​o​w​l​e​d​g​e​-​e​r​a​-​42402

Our world through the lens of science. Every week, in your inbox.

Get the newsletter