Home / Chroniques / Scientific fraud: analysis of a growing phenomenon
π Society π Science and technology

Scientific fraud: analysis of a growing phenomenon

Cyril Labbé_VF
Cyril Labbé
Professor at Université Grenoble-Alpes
Key takeaways
  • A fraudulent article is a deliberately erroneous scientific publication that contains one or more breaches of scientific integrity.
  • Currently, analysis tools are far from covering all cases of fraud, as these are very diverse in nature and each requires specific detectors.
  • The tools used to conceal falsifications leave specific signatures, such as groups of words with specific frequencies.
  • These breaches of trust are not always carried out by isolated individuals but may be linked to cooperation within networks of publishers and authors.
  • One solution to tackling scientific fraud would be to strengthen the evaluation system through more preventive measures, or even to rethink its current indicators.

An increase in art­icle retrac­tions, the pro­lif­er­a­tion of paper mills and pred­at­ory journ­als: sci­entif­ic lit­er­at­ure is under­go­ing a crisis that threatens con­fid­ence in its integ­rity. Cyril Lab­bé, a com­puter sci­ence research­er at Uni­versité Gren­oble-Alpes, untangles the causes that have led to such abuses. He is one of France’s lead­ing spe­cial­ists in auto­mat­ic fraud detec­tion and, since 2020, has been coordin­at­ing the mul­tidiscip­lin­ary Nan­o­Bubbles pro­ject1, which ana­lyses why, when and how sci­ence struggles to cor­rect itself.

How did you start working on scientific fraud?

My interest in the sub­ject emerged in 2009, when bib­li­o­met­ric indic­at­ors, such as the h‑index, began to be used in the eval­u­ation of research­ers. To test the lim­its of the cal­cu­la­tion tools, I cre­ated a fake research­er, ‘Ike Antkare’, and assigned him dozens of auto­mat­ic­ally gen­er­ated, aber­rant pub­lic­a­tions. These fake papers cited each oth­er, and Google Schol­ar tem­por­ar­ily assigned Ike Antkare a high­er h‑index than Ein­stein. This exper­i­ment con­vinced me of the value of research into the auto­mat­ic detec­tion of fraud­u­lent articles. 

What does the term ‘fraudulent article’ cover?

A fraud­u­lent art­icle is gen­er­ally defined as a sci­entif­ic pub­lic­a­tion that is delib­er­ately erro­neous, either in part or in its entirety. Breaches of sci­entif­ic integ­rity can take many dif­fer­ent forms: pla­gi­ar­ism, falsi­fic­a­tion or fab­ric­a­tion of data, images and res­ults, mis­use of cita­tions, pur­chase of art­icles pro­duced by paper mills, and so on. In extreme cases the art­icle, although sci­entif­ic in appear­ance, is com­pletely meaningless.

Can we accurately measure the scale of the phenomenon?

By defin­i­tion, we can only count cases of fraud that are detec­ted and the avail­able detec­tion tools are far from cov­er­ing all cases. Hence, we are forced to settle for approx­im­a­tions. For example, I am col­lab­or­at­ing with Guil­laume Cabanac from Insti­tut de recher­che en inform­atique de Toulouse (IRIT) on a tool called Prob­lem­at­ic Paper Screen­er (PPS)2, which is based on nine dif­fer­ent detect­ors. Across the entire sci­entif­ic lit­er­at­ure – approx­im­ately 130 mil­lion art­icles – we have identified:

  • more than 912,000 art­icles con­tain­ing ref­er­ences to retrac­ted pub­lic­a­tions, which there­fore also deserve to be reviewed,
  • approx­im­ately 800 art­icles con­tain­ing spe­cif­ic fac­tu­al errors,
  • more than 21,000 con­tain­ing mean­ing­less ‘tor­tured expres­sions’ that may be the res­ult of plagiarism,
  • more than 350 com­pletely absurd art­icles, gen­er­ated auto­mat­ic­ally, some of which have been online for years.

These last two fig­ures are even more alarm­ing as they do not only con­cern pred­at­ory journ­als – which pub­lish without prop­er sci­entif­ic eval­u­ation in exchange for pay­ment – but rather renowned pub­lish­ers such as Spring­er Nature, Elsevi­er and Wiley. As such, it reveals pro­found flaws in the peer review sys­tem, which is at the heart of sci­entif­ic evaluation.

How do you detect these types of fraud?

Each type of fraud requires a spe­cif­ic detect­or. The tools used to pro­duce or con­ceal fraud some­times leave char­ac­ter­ist­ic sig­na­tures. For example, some pseudo-sci­entif­ic text gen­er­at­ors use groups of words with a char­ac­ter­ist­ic fre­quency. Some para­phras­ing soft­ware, used to mask pla­gi­ar­ism, intro­duces ‘tor­tured phrases’ into the text: ‘bos­om per­il’ replaces ‘breast can­cer’, ‘made-man con­scious­ness’ is used for ‘arti­fi­cial intel­li­gence’, etc. Our detect­ors exploit these flaws. Oth­er teams also track metadata, reveal­ing auto­mated sub­mis­sion pat­terns, for example. In all cases, devel­op­ing a detect­or is a slow and metic­u­lous task. This is hardly sur­pris­ing: it involves rein­tro­du­cing human expert­ise where it has been lack­ing. And that takes time.

You contacted the ‘reputable’ publishing houses involved in the fraud detected by Problematic Paper Screener. How do they explain such breaches?

They con­sider them­selves vic­tims of dis­hon­est authors or unscru­pu­lous edit­ors and proofread­ers seek­ing to advance their careers. Peer review is gen­er­ally del­eg­ated to volun­teer research­ers who are super­vised by a volun­teer or paid research­er-edit­or, who selects the review­ers and makes the final decision to reject or accept the manu­script. Invest­ment in these col­lect­ive tasks can indeed be used for pro­mo­tions. But this ana­lys­is seems too simplist­ic to me.

In what way is it too fast?

Firstly, because con­trary to pop­u­lar belief, fraud is not always com­mit­ted by indi­vidu­als act­ing alone. A study pub­lished in August 20253 in Pro­ceed­ings of the Nation­al Academy of Sci­ences (PNAS), a US sci­entif­ic journ­al, high­lighted numer­ous cases res­ult­ing from cooper­a­tion between net­works of pub­lish­ers and authors, as well as the role played by brokers who facil­it­ate the mass sub­mis­sion of fraud­u­lent pub­lic­a­tions to tar­geted journals. 

Secondly, because pub­lish­ing houses also have an interest in increas­ing their pub­lic­a­tion volume. When they are sub­scrip­tion-based, a large volume allows them to increase order amounts by selling bundles of journ­als, and in an ‘author-pays’ mod­el, more art­icles mean more rev­en­ue. This fin­an­cial interest can lead them to turn a blind eye to dubi­ous prac­tices – or even encour­age them.

How can we explain that fraudulent, even absurd, publications remain online for years without anyone batting an eyelid?

There is undoubtedly a tend­ency with­in the sci­entif­ic com­munity not to report sus­pi­cious or absurd art­icles, either out of cau­tion or lack of interest. But even when art­icles are repor­ted, retrac­tion can be a long and dif­fi­cult pro­cess. No one, from pub­lish­ing houses to authors to edit­ors, wants to see their repu­ta­tion tar­nished by a retrac­tion… This leads to reluct­ance and res­ist­ance with­in pub­lish­ing houses and among authors, even when the prob­lem is obvious.

What actions are being taken to improve the situation?

Actions are gen­er­ally pre­vent­ive rather than cor­rect­ive. Aca­dem­ic and private ini­ti­at­ives (some­times with­in pub­lish­ing houses them­selves) exist to devel­op fraud detect­ors that can adapt to the innov­a­tions of fraud­sters. Pub­lish­ing houses are strength­en­ing their eth­ics com­mit­tees and peer review super­vi­sion, and they are col­lab­or­at­ing with each oth­er to identi­fy cer­tain prac­tices, such as dual sub­mis­sion, which is most often not accep­ted. Uni­ver­sit­ies and the aca­dem­ic world are also tak­ing action. In most research mas­ter­’s pro­grammes, for example, train­ing in sci­entif­ic integ­rity has become compulsory.

In 2023, an open letter4 signed by some fifteen international researchers was sent to the management of the CNRS, criticising its handling of a misconduct case as too lenient. How do you view the desire expressed by some to crack down harder on fraudsters? 

This is not a course of action that I favour. In my opin­ion, it is the cur­rent eval­u­ation sys­tem that is at the root of the prob­lem, and it is this sys­tem that needs to be reformed to achieve last­ing res­ults. A purely quant­it­at­ive eval­u­ation is inher­ently dan­ger­ous, as it is blind to the sci­entif­ic con­tent of articles.

How can the evaluation system be improved?

It seems impossible today to do away with met­rics alto­geth­er, but less import­ance should be placed on them in indi­vidu­al and col­lect­ive research eval­u­ations. Oth­er types of con­tri­bu­tions could be val­ued: in com­puter sci­ence, for example, the pro­duc­tion of code or data­bases. How­ever, I doubt that the sys­tem will change in the short term: the cur­rent indic­at­ors are too easy to estab­lish and use, includ­ing for research­ers. This brings us to a new para­dox: many research­ers com­plain about the pres­sure they are under and the over­valu­ation of sci­entif­ic art­icles in the research­er eval­u­ation sys­tem, but it is easy and tempt­ing to use these quant­it­at­ive indic­at­ors to jus­ti­fy the rel­ev­ance of their work. It is so much sim­pler to tell a fun­der that you have been pub­lished in a pres­ti­gi­ous journ­al than to explain the bene­fits of your research for society…

So, should we resign ourselves to an increase in fraud?

Abso­lutely not. But we must act with patience and clar­ity. I am con­vinced that the human­it­ies have a cent­ral role to play in this quest for more hon­est sci­entif­ic lit­er­at­ure. That is why the ERC-Syn­ergy Nan­o­Bubbles pro­ject brings togeth­er research­ers from vari­ous dis­cip­lines: Raphaël Lévy (Par­is Sor­bonne Nord Uni­ver­sity), an expert in nan­o­bi­o­logy; Cyr­us Mody (Maastricht Uni­ver­sity), an expert in the his­tory of sci­ence; and Willem Halff­man (Rad­boud Uni­ver­sity), renowned for his work on the func­tion­ing of sci­entif­ic expert­ise and policy. Togeth­er with the pro­ject mem­bers, com­puter sci­ent­ists, soci­olo­gists of sci­ence and philo­soph­ers, we are ana­lys­ing devel­op­ments in the aca­dem­ic world and pub­lish­ing houses to bet­ter under­stand the cur­rent situ­ation. This ana­lyt­ic­al work is essen­tial to identi­fy vul­ner­ab­il­it­ies, pro­pose tar­geted pre­vent­ive and cor­rect­ive actions, and thus con­trib­ute to restor­ing last­ing con­fid­ence in the sci­entif­ic community.

Interview by Anne Orliac
1https://​nan​o​bubbles​.hypo​theses​.org/
2https://www.irit.fr/~Guillaume.Cabanac/problematic-paper-screener
3R.A.K. Richardson,S.S. Hong,J.A. Byrne,T. Stoeger, & L.A.N. Amaral, The entit­ies enabling sci­entif­ic fraud at scale are large, resi­li­ent, and grow­ing rap­idly, Proc. Natl. Acad. Sci. U.S.A. 122 (32) e2420092122, https://​doi​.org/​1​0​.​1​0​7​3​/​p​n​a​s​.​2​4​2​0​0​92122 (2025).
4https://​deevy​bee​.blog​spot​.com/​2​0​2​3​/​0​2​/​o​p​e​n​-​l​e​t​t​e​r​-​t​o​-​c​n​r​s​.html

Support accurate information rooted in the scientific method.

Donate