Scientific fraud: analysis of a growing phenomenon
- A fraudulent article is a deliberately erroneous scientific publication that contains one or more breaches of scientific integrity.
- Currently, analysis tools are far from covering all cases of fraud, as these are very diverse in nature and each requires specific detectors.
- The tools used to conceal falsifications leave specific signatures, such as groups of words with specific frequencies.
- These breaches of trust are not always carried out by isolated individuals but may be linked to cooperation within networks of publishers and authors.
- One solution to tackling scientific fraud would be to strengthen the evaluation system through more preventive measures, or even to rethink its current indicators.
An increase in article retractions, the proliferation of paper mills and predatory journals: scientific literature is undergoing a crisis that threatens confidence in its integrity. Cyril Labbé, a computer science researcher at Université Grenoble-Alpes, untangles the causes that have led to such abuses. He is one of France’s leading specialists in automatic fraud detection and, since 2020, has been coordinating the multidisciplinary NanoBubbles project1, which analyses why, when and how science struggles to correct itself.
How did you start working on scientific fraud?
My interest in the subject emerged in 2009, when bibliometric indicators, such as the h‑index, began to be used in the evaluation of researchers. To test the limits of the calculation tools, I created a fake researcher, ‘Ike Antkare’, and assigned him dozens of automatically generated, aberrant publications. These fake papers cited each other, and Google Scholar temporarily assigned Ike Antkare a higher h‑index than Einstein. This experiment convinced me of the value of research into the automatic detection of fraudulent articles.
What does the term ‘fraudulent article’ cover?
A fraudulent article is generally defined as a scientific publication that is deliberately erroneous, either in part or in its entirety. Breaches of scientific integrity can take many different forms: plagiarism, falsification or fabrication of data, images and results, misuse of citations, purchase of articles produced by paper mills, and so on. In extreme cases the article, although scientific in appearance, is completely meaningless.
Can we accurately measure the scale of the phenomenon?
By definition, we can only count cases of fraud that are detected and the available detection tools are far from covering all cases. Hence, we are forced to settle for approximations. For example, I am collaborating with Guillaume Cabanac from Institut de recherche en informatique de Toulouse (IRIT) on a tool called Problematic Paper Screener (PPS)2, which is based on nine different detectors. Across the entire scientific literature – approximately 130 million articles – we have identified:
- more than 912,000 articles containing references to retracted publications, which therefore also deserve to be reviewed,
- approximately 800 articles containing specific factual errors,
- more than 21,000 containing meaningless ‘tortured expressions’ that may be the result of plagiarism,
- more than 350 completely absurd articles, generated automatically, some of which have been online for years.
These last two figures are even more alarming as they do not only concern predatory journals – which publish without proper scientific evaluation in exchange for payment – but rather renowned publishers such as Springer Nature, Elsevier and Wiley. As such, it reveals profound flaws in the peer review system, which is at the heart of scientific evaluation.
How do you detect these types of fraud?
Each type of fraud requires a specific detector. The tools used to produce or conceal fraud sometimes leave characteristic signatures. For example, some pseudo-scientific text generators use groups of words with a characteristic frequency. Some paraphrasing software, used to mask plagiarism, introduces ‘tortured phrases’ into the text: ‘bosom peril’ replaces ‘breast cancer’, ‘made-man consciousness’ is used for ‘artificial intelligence’, etc. Our detectors exploit these flaws. Other teams also track metadata, revealing automated submission patterns, for example. In all cases, developing a detector is a slow and meticulous task. This is hardly surprising: it involves reintroducing human expertise where it has been lacking. And that takes time.
You contacted the ‘reputable’ publishing houses involved in the fraud detected by Problematic Paper Screener. How do they explain such breaches?
They consider themselves victims of dishonest authors or unscrupulous editors and proofreaders seeking to advance their careers. Peer review is generally delegated to volunteer researchers who are supervised by a volunteer or paid researcher-editor, who selects the reviewers and makes the final decision to reject or accept the manuscript. Investment in these collective tasks can indeed be used for promotions. But this analysis seems too simplistic to me.
In what way is it too fast?
Firstly, because contrary to popular belief, fraud is not always committed by individuals acting alone. A study published in August 20253 in Proceedings of the National Academy of Sciences (PNAS), a US scientific journal, highlighted numerous cases resulting from cooperation between networks of publishers and authors, as well as the role played by brokers who facilitate the mass submission of fraudulent publications to targeted journals.

Secondly, because publishing houses also have an interest in increasing their publication volume. When they are subscription-based, a large volume allows them to increase order amounts by selling bundles of journals, and in an ‘author-pays’ model, more articles mean more revenue. This financial interest can lead them to turn a blind eye to dubious practices – or even encourage them.
How can we explain that fraudulent, even absurd, publications remain online for years without anyone batting an eyelid?
There is undoubtedly a tendency within the scientific community not to report suspicious or absurd articles, either out of caution or lack of interest. But even when articles are reported, retraction can be a long and difficult process. No one, from publishing houses to authors to editors, wants to see their reputation tarnished by a retraction… This leads to reluctance and resistance within publishing houses and among authors, even when the problem is obvious.
What actions are being taken to improve the situation?
Actions are generally preventive rather than corrective. Academic and private initiatives (sometimes within publishing houses themselves) exist to develop fraud detectors that can adapt to the innovations of fraudsters. Publishing houses are strengthening their ethics committees and peer review supervision, and they are collaborating with each other to identify certain practices, such as dual submission, which is most often not accepted. Universities and the academic world are also taking action. In most research master’s programmes, for example, training in scientific integrity has become compulsory.
In 2023, an open letter4 signed by some fifteen international researchers was sent to the management of the CNRS, criticising its handling of a misconduct case as too lenient. How do you view the desire expressed by some to crack down harder on fraudsters?
This is not a course of action that I favour. In my opinion, it is the current evaluation system that is at the root of the problem, and it is this system that needs to be reformed to achieve lasting results. A purely quantitative evaluation is inherently dangerous, as it is blind to the scientific content of articles.
How can the evaluation system be improved?
It seems impossible today to do away with metrics altogether, but less importance should be placed on them in individual and collective research evaluations. Other types of contributions could be valued: in computer science, for example, the production of code or databases. However, I doubt that the system will change in the short term: the current indicators are too easy to establish and use, including for researchers. This brings us to a new paradox: many researchers complain about the pressure they are under and the overvaluation of scientific articles in the researcher evaluation system, but it is easy and tempting to use these quantitative indicators to justify the relevance of their work. It is so much simpler to tell a funder that you have been published in a prestigious journal than to explain the benefits of your research for society…
So, should we resign ourselves to an increase in fraud?
Absolutely not. But we must act with patience and clarity. I am convinced that the humanities have a central role to play in this quest for more honest scientific literature. That is why the ERC-Synergy NanoBubbles project brings together researchers from various disciplines: Raphaël Lévy (Paris Sorbonne Nord University), an expert in nanobiology; Cyrus Mody (Maastricht University), an expert in the history of science; and Willem Halffman (Radboud University), renowned for his work on the functioning of scientific expertise and policy. Together with the project members, computer scientists, sociologists of science and philosophers, we are analysing developments in the academic world and publishing houses to better understand the current situation. This analytical work is essential to identify vulnerabilities, propose targeted preventive and corrective actions, and thus contribute to restoring lasting confidence in the scientific community.

