π Science and technology

The reproducibility crisis: is science deceiving us?

Florian Naudet

Psychiatrist, Professor of Clinical Medicine at Université de Rennes, and Senior Member of the Institut Universitaire de France

Larry Vernon Hedges

Professor of Statistics and Data Science, Education and Social Policy, Psychology, and Medical Social Sciences at Northwestern University

Thomas Rhys Evans

Psychologist and Lecturer in Occupational Psychology and Open Research at the University of Greenwich

Key takeaways

In 2005, John Ioannidis demonstrated that the probability of an effect reported in a scientific article being real was significantly reduced under certain conditions such as small sample sizes.
Numerous studies have shown that researchers’ cognitive biases, particularly preconceptions, can influence their results.
Studies aimed at reproducing results (replication) can contribute to confirmation of scientific knowledge.
Open science and transparency initiatives remain limited and coexist with the dominant culture of scientific evaluation rather than replacing it.

The first of the ten criteria for ‘Gold Standard Science’ laid down by the Trump administration states that “to be deemed worthy of informing public policy and underpinning regulatory decisions in the United States, science should be reproducible.” This requirement comes at a time when several large-scale studies show that researchers are struggling to reproduce certain results established by their peers across many disciplines. How concerned should we be? Is it reasonable to expect science to always be reproducible? And how can we overcome this ‘crisis’?

In 2005, an article with the evocative title “Why Most Published Research Findings Are False¹,” published in the journal PLOS Medicine, set the cat among the pigeons. In it, John Ioannidis assessed the reliability of studies using statistical analysis to test hypotheses; a very common type of study in empirical research. Using simulations, the researcher showed that the probability of an effect reported in a scientific article being real was significantly reduced under certain conditions – including small sample sizes, small observed effects, a large number of hypotheses tested, non-standardised methods, conflicts of interest, or significant competition in the field.

Massive experimental confirmations

In the years that followed, to document the phenomenon empirically, the Open Science Collaboration² set out to replicate 100 experimental studies in psychology³. The results, published in 2015, caused a sensation. Whilst 97% of the original studies identified a ‘significant’ effect, only 36% of the replications found one, and on average of half the magnitude.

Since then, John Ioannidis’s findings have been confirmed across numerous disciplines. A collaboration between the Center for Open Science and Science Exchange has thus examined cancer biology. In the replication studies, effect sizes are on average 85% smaller than in the original results, and only 46% of the effects are successfully replicated⁴. The same findings emerged from an initiative covering the social, political, economic and psychological sciences. Published in a series of articles in Nature⁵ in April 2026, its conclusions show that, in around half of cases, researchers were unable to replicate the results of the original studies.

The problem is therefore a deep-seated one. But does this mean we should conclude that all scientific knowledge deemed definitive in these fields is suspicious? Let us be clear from the outset: no. An effect is only considered certain when it has been confirmed by multiple high-quality pieces of evidence, such as randomised controlled trials conducted under proper conditions or meta-analyses with high statistical power. “There is no reason to doubt the effectiveness of current Covid vaccines or the link between lung cancer and smoking, to cite just two examples, as they have been identified in converging research across different disciplines,” comments Florian Naudet.

For example, publishers are reluctant to accept studies concluding that there is no significant effect. This filter deprives scientific literature of negative results that are valuable for the robustness of knowledge.

Nevertheless, the error rate in published studies is much higher than previously thought, and this warrants attention. In a system where results accumulate and are rapidly aggregated, the challenge is as much scientific as it is collective. The aim is to limit unnecessary detours in research, avoid wasting resources on flimsy leads, and reduce the risk that public decisions are based on flawed results. Let’s recall, for example, that heads of state promoted the use of hydroxychloroquine against Covid-19 on the basis of initial observational studies that were still fragile, before more robust clinical trials gradually concluded that there was no significant clinical benefit. Some of these initial, highly publicised studies have even been retracted.

As John Ioannidis has pointed out, studies with a low level of evidence (isolated case studies, studies with very small sample sizes, etc.) are the most at risk. “But meta-analyses⁶ or randomised trials⁷, although considered much more reliable, can also be affected by bias and errors. On homeopathy, for example, the meta-analyses available in the literature do not converge, which should set alarm bells ringing,” explains Florian Naudet. A more in-depth analysis, synthesising the meta-analyses and systematic reviews already published, carried out by the Australian National Health and Medical Research Council (NHMRC), has in fact demonstrated the fragility of the overwhelming majority of studies concluding that homeopathy has an effect, and concluded that there is no health condition for which there is sufficient evidence of its efficacy⁸ “In fact, there is no reason why a pharmacological effect of homeopathy should be identified,” continues Florian Naudet.

Questionable practices at fault?

But then how can errors be minimised? Part of the answer lies in tracking down ‘questionable practices’— meaning those operations that are not openly fraudulent but are harmful, and which can creep into every stage of the research process, from the formulation of the hypothesis to the interpretation and publication of the results.

They are intrinsically linked to the current system of rewarding researchers, for whom publishing is synonymous with existence. Peer-reviewed journal articles are, in fact, the showcase and benchmark of competence, and the pressure to publish weighs heavily on the entire research system. “Problems with reproducibility are more closely linked to expected norms and standards than to deliberate fraud,” asserts Thomas Rhys Evans.

For example, publishers are reluctant to accept studies concluding that there is no significant effect. This filter deprives scientific literature of negative results that are valuable for the robustness of knowledge. For various reasons, researchers may also decide not to publish their results. “In 2008, for example, it was shown that one in two studies on antidepressants was not published, and that, in general, the unpublished ones concluded that there was no effect⁹.” The interpretation of data can also be exaggerated to embellish an ambiguous or weak result. If the findings lack significance, authors may be tempted to adjust their research design to reveal a significant effect (known as p‑hacking). Conversely, an unexpected result can lead to the retrospective modification of initial hypotheses (HARKing). “The scientific approach is meant to be hypothesis-deduction. But in practice, we are much more inductive than we realise,” says Florian Naudet.

Numerous studies have also shown that researchers’ cognitive biases, particularly pre-existing beliefs, can influence their results. This may also be the case with conflicts of interest. “Research suggests, for example, that meta-analyses of medical devices and pharmacological treatments¹⁰ funded by industry, or studies on homeopathy¹¹ conducted by researchers with conflicts of interest, are more likely to conclude that there are positive effects than others,” the researcher continues.

Possible courses of action in the realm of open science

In the face of these questionable practices, what safeguards should be put in place? There is no single solution. Open access to materials, data and code, as well as pre-registration, can improve transparency and reproducibility,” explains Thomas Rhys Evans. Pre-registration involves declaring, before collecting data, what one intends to do – any subsequent changes to the hypotheses or methodology become more visible. “The publication of protocols should be mandated by research funding agencies as well as by scientific journals for the publication of results, as is already the case for clinical trials in over 200 of the most prestigious medical journals,” agrees Larry Vernon Hedges.

“These practices are beginning to spread, and many of us believe that greater transparency will have beneficial effects,” summarises Florian Naudet. “But currently these recommendations are based more on values than on evidence. Will the community agree to comply with these requirements? Will they have the desired effect? This remains to be proven by robust studies.” And, therefore, to secure dedicated funding.

A science that must learn to replicate

A minor revolution is also needed in the way we approach replication. The large-scale initiatives mentioned above are invaluable for establishing broad-based assessments, but their methodology is not always suited to evaluating research on a case-by-case basis. However, studies aimed at replicating results (known as replication studies) can contribute to consolidating scientific knowledge in very concrete terms, provided they are ‘well’ designed. “Work, such as that of Harry Collins¹² [Editor’s note: a British sociologist born in 1943], suggests that subtle factors influence replication success, and even the scientists involved may not always know which methodological details are crucial,” says Larry Vernon Hedges.

Strictly speaking, it is not enough, to rely on a single replication study to robustly evaluate a previous result.

Strictly speaking, it is not enough, in fact, to rely on a single replication study to robustly evaluate a previous result. “Nor is it enough to rely on several replications averaged and compared with the original study. Several independent replications are needed, and each must be compared with the others. The tricky part is determining which replication studies are sufficiently similar (to the original study and to one another) to be relevant, and sufficiently independent to provide new information.”

This is a challenge given that discovery and novelty are highly valued by journals, to the detriment of replication. “There are also very few incentives and funding opportunities for replication studies,” confirms Florian Naudet.

An entire ecosystem in need of reform

Science’s self-reflection is thus gradually bringing to light the strengths and weaknesses of an entire system, from which publishers, universities, funders and policymakers must not be excluded. “We increasingly ask researchers to do more with less and to demonstrate impact across all aspects of their work. Yet we should ask why they are rewarded for the number of publications rather than the quality of their work, and whether they truly have the training, support and infrastructure needed to share their research effectively. Necessary changes will require action from all parts of the system — governments, funding bodies, research institutions and research support staff alike,” concludes Thomas Rhys Evans.

Outside the scientific community, initiatives are beginning to emerge: the EU has launched a call for proposals for the replication of studies¹³, publishers and funders are gradually tightening their transparency requirements, and public policies, particularly in Europe, are promoting open science. But these developments remain limited and coexist with the dominant culture of scientific evaluation rather than replacing it. Hence, there is still a long way to go before the system can overcome its contradictions.