How to spot errors in medical research publications
- With several million scientific articles published every year, cleaning up the scientific literature is like looking for a needle in a haystack.
- Focusing solely on high-impact publications is the approach adopted by The Medical Evidence Project, an American initiative funded by a $900,000 start-up grant.
- Analysts primarily use statistical methods to identify inconsistencies in the data and conclusions presented in the articles.
- GRIM-U is a tool capable of detecting a specific type of mathematical inconsistency in certain statistical tests.
Launched in June 2025 with funding from the philanthropic organisation Coefficient Giving, The Medical Evidence Project aims to detect unreliable articles, fraudulent or otherwise, that are likely to influence current medical recommendations. We discuss this with Alice Dreger, PhD and editor for the project.
Cleaning up the scientific literature: a targeted approach
With several million scientific articles published each year in peer-reviewed journals, cleaning up the scientific literature is like looking for a needle in a haystack. Where should one start to track down anomalies? Whilst some advocate a systematic analysis of the entire literature to identify traces left by certain automated fraudulent practices, others intend to focus on a much more targeted assessment, focusing solely on high-impact publications. This second approach is the one adopted by The Medical Evidence Project, an exploratory US initiative funded by a $900,000 seed grant from Coefficient Giving. Formerly known as Open Philanthropy, this organisation champions “effective” altruism, meaning it supports initiatives based on their expected measurable impact.
The project brings together permanent members and consultants led by James Heathers, PhD, a research associate at Linnaeus University in Sweden who has been working outside the traditional academic framework for several years. It is hosted by the Centre for Scientific Integrity. This American non-profit organisation, launched in 2014, is known as the parent organisation of Retraction Watch, a news site that tracks and documents retractions of scientific articles.

The Medical Evidence Project has therefore made effectiveness its main focus, choosing to concentrate on articles that support current medical recommendations having a direct and significant influence on patient morbidity and mortality, for example those incorporated into official clinical guidelines, or which influence standard medical practice. An objective that James Heathers sums up on his LinkedIn page with a catchy slogan: “Finding bad medical evidence before it kills people.”
The team’s business is based on forensic metascience. “Forensic metascience is kind of what it sounds like: it’s detective work (forensics) using scientific tools (science) to examine the scientific literature (a meta move),” explains Alice Dreger. In practice, analysts mainly use statistical methods to identify inconsistencies in the data and conclusions presented in articles. As James Heathers, who formalised the concept in an article entitled Introduction to Forensic Metascience1, points out, “Forensic metascientific analysis is designed to modify trust by evaluating research consistency. It is not designed to “find fraud”. While this may happen, it is not the sole focus of forensic metascience as a research area and practice, it is simply the loudest consequence.”
GRIM‑U: an initial tool for detecting mathematical inconsistencies
Just one year after its launch, it must be admitted that the project’s visible results are still limited. Alice Dreger, however, sees nothing surprising in this: “The first year of our project was designed from the outset to focus on the development of a system to figure out where our analysts should be focusing efforts in terms of deep analysis.” The team has in fact announced the development of an initial tool, GRIM‑U2, capable of detecting a specific type of mathematical inconsistency in certain statistical tests, the so-called “Mann-Whitney U tests”, which can be used in clinical research to compare two independent groups of patients. “GRIM‑U was recently used to inform Retraction Watch reporting on the work of one particular surgeon.” The tool reportedly detected statistical anomalies in a short article supporting the efficacy of a surgical device developed by the practitioner and his colleagues3.
But the researcher makes no secret of the fact that developing effective tools is a complex undertaking, and faces major challenges, notably “legal restrictions on the scraping of information – restrictions designed to protect intellectual property rights – and the challenge of tuning detection code to avoid false positives”. The team can also count on a highly collaborative community and will not hesitate to use tools developed by other contributors if necessary. It remains to be seen, over the next twelve months, whether this targeted strategy will produce results that live up to its ambitions.

