Home / Chroniques / Scientific fraud: analysis of a growing phenomenon
π Society π Science and technology

Scientific fraud: analysis of a growing phenomenon

Cyril Labbé_VF
Cyril Labbé
Professor at Université Grenoble-Alpes
Key takeaways
  • A fraudulent article is a deliberately erroneous scientific publication that contains one or more breaches of scientific integrity.
  • Currently, analysis tools are far from covering all cases of fraud, as these are very diverse in nature and each requires specific detectors.
  • The tools used to conceal falsifications leave specific signatures, such as groups of words with specific frequencies.
  • These breaches of trust are not always carried out by isolated individuals but may be linked to cooperation within networks of publishers and authors.
  • One solution to tackling scientific fraud would be to strengthen the evaluation system through more preventive measures, or even to rethink its current indicators.

An increase in arti­cle retrac­tions, the pro­lif­er­a­tion of paper mills and preda­to­ry jour­nals: sci­en­tif­ic lit­er­a­ture is under­go­ing a cri­sis that threat­ens con­fi­dence in its integri­ty. Cyril Lab­bé, a com­put­er sci­ence researcher at Uni­ver­sité Greno­ble-Alpes, untan­gles the caus­es that have led to such abus­es. He is one of France’s lead­ing spe­cial­ists in auto­mat­ic fraud detec­tion and, since 2020, has been coor­di­nat­ing the mul­ti­dis­ci­pli­nary NanoBub­bles project1, which analy­ses why, when and how sci­ence strug­gles to cor­rect itself.

How did you start working on scientific fraud?

My inter­est in the sub­ject emerged in 2009, when bib­lio­met­ric indi­ca­tors, such as the h‑index, began to be used in the eval­u­a­tion of researchers. To test the lim­its of the cal­cu­la­tion tools, I cre­at­ed a fake researcher, ‘Ike Antkare’, and assigned him dozens of auto­mat­i­cal­ly gen­er­at­ed, aber­rant pub­li­ca­tions. These fake papers cit­ed each oth­er, and Google Schol­ar tem­porar­i­ly assigned Ike Antkare a high­er h‑index than Ein­stein. This exper­i­ment con­vinced me of the val­ue of research into the auto­mat­ic detec­tion of fraud­u­lent articles. 

What does the term ‘fraudulent article’ cover?

A fraud­u­lent arti­cle is gen­er­al­ly defined as a sci­en­tif­ic pub­li­ca­tion that is delib­er­ate­ly erro­neous, either in part or in its entire­ty. Breach­es of sci­en­tif­ic integri­ty can take many dif­fer­ent forms: pla­gia­rism, fal­si­fi­ca­tion or fab­ri­ca­tion of data, images and results, mis­use of cita­tions, pur­chase of arti­cles pro­duced by paper mills, and so on. In extreme cas­es the arti­cle, although sci­en­tif­ic in appear­ance, is com­plete­ly meaningless.

Can we accurately measure the scale of the phenomenon?

By def­i­n­i­tion, we can only count cas­es of fraud that are detect­ed and the avail­able detec­tion tools are far from cov­er­ing all cas­es. Hence, we are forced to set­tle for approx­i­ma­tions. For exam­ple, I am col­lab­o­rat­ing with Guil­laume Cabanac from Insti­tut de recherche en infor­ma­tique de Toulouse (IRIT) on a tool called Prob­lem­at­ic Paper Screen­er (PPS)2, which is based on nine dif­fer­ent detec­tors. Across the entire sci­en­tif­ic lit­er­a­ture – approx­i­mate­ly 130 mil­lion arti­cles – we have identified:

  • more than 912,000 arti­cles con­tain­ing ref­er­ences to retract­ed pub­li­ca­tions, which there­fore also deserve to be reviewed,
  • approx­i­mate­ly 800 arti­cles con­tain­ing spe­cif­ic fac­tu­al errors,
  • more than 21,000 con­tain­ing mean­ing­less ‘tor­tured expres­sions’ that may be the result of plagiarism,
  • more than 350 com­plete­ly absurd arti­cles, gen­er­at­ed auto­mat­i­cal­ly, some of which have been online for years.

These last two fig­ures are even more alarm­ing as they do not only con­cern preda­to­ry jour­nals – which pub­lish with­out prop­er sci­en­tif­ic eval­u­a­tion in exchange for pay­ment – but rather renowned pub­lish­ers such as Springer Nature, Else­vi­er and Wiley. As such, it reveals pro­found flaws in the peer review sys­tem, which is at the heart of sci­en­tif­ic evaluation.

How do you detect these types of fraud?

Each type of fraud requires a spe­cif­ic detec­tor. The tools used to pro­duce or con­ceal fraud some­times leave char­ac­ter­is­tic sig­na­tures. For exam­ple, some pseu­do-sci­en­tif­ic text gen­er­a­tors use groups of words with a char­ac­ter­is­tic fre­quen­cy. Some para­phras­ing soft­ware, used to mask pla­gia­rism, intro­duces ‘tor­tured phras­es’ into the text: ‘bosom per­il’ replaces ‘breast can­cer’, ‘made-man con­scious­ness’ is used for ‘arti­fi­cial intel­li­gence’, etc. Our detec­tors exploit these flaws. Oth­er teams also track meta­da­ta, reveal­ing auto­mat­ed sub­mis­sion pat­terns, for exam­ple. In all cas­es, devel­op­ing a detec­tor is a slow and metic­u­lous task. This is hard­ly sur­pris­ing: it involves rein­tro­duc­ing human exper­tise where it has been lack­ing. And that takes time.

You contacted the ‘reputable’ publishing houses involved in the fraud detected by Problematic Paper Screener. How do they explain such breaches?

They con­sid­er them­selves vic­tims of dis­hon­est authors or unscrupu­lous edi­tors and proof­read­ers seek­ing to advance their careers. Peer review is gen­er­al­ly del­e­gat­ed to vol­un­teer researchers who are super­vised by a vol­un­teer or paid researcher-edi­tor, who selects the review­ers and makes the final deci­sion to reject or accept the man­u­script. Invest­ment in these col­lec­tive tasks can indeed be used for pro­mo­tions. But this analy­sis seems too sim­plis­tic to me.

In what way is it too fast?

First­ly, because con­trary to pop­u­lar belief, fraud is not always com­mit­ted by indi­vid­u­als act­ing alone. A study pub­lished in August 20253 in Pro­ceed­ings of the Nation­al Acad­e­my of Sci­ences (PNAS), a US sci­en­tif­ic jour­nal, high­light­ed numer­ous cas­es result­ing from coop­er­a­tion between net­works of pub­lish­ers and authors, as well as the role played by bro­kers who facil­i­tate the mass sub­mis­sion of fraud­u­lent pub­li­ca­tions to tar­get­ed journals. 

Sec­ond­ly, because pub­lish­ing hous­es also have an inter­est in increas­ing their pub­li­ca­tion vol­ume. When they are sub­scrip­tion-based, a large vol­ume allows them to increase order amounts by sell­ing bun­dles of jour­nals, and in an ‘author-pays’ mod­el, more arti­cles mean more rev­enue. This finan­cial inter­est can lead them to turn a blind eye to dubi­ous prac­tices – or even encour­age them.

How can we explain that fraudulent, even absurd, publications remain online for years without anyone batting an eyelid?

There is undoubt­ed­ly a ten­den­cy with­in the sci­en­tif­ic com­mu­ni­ty not to report sus­pi­cious or absurd arti­cles, either out of cau­tion or lack of inter­est. But even when arti­cles are report­ed, retrac­tion can be a long and dif­fi­cult process. No one, from pub­lish­ing hous­es to authors to edi­tors, wants to see their rep­u­ta­tion tar­nished by a retrac­tion… This leads to reluc­tance and resis­tance with­in pub­lish­ing hous­es and among authors, even when the prob­lem is obvious.

What actions are being taken to improve the situation?

Actions are gen­er­al­ly pre­ven­tive rather than cor­rec­tive. Aca­d­e­m­ic and pri­vate ini­tia­tives (some­times with­in pub­lish­ing hous­es them­selves) exist to devel­op fraud detec­tors that can adapt to the inno­va­tions of fraud­sters. Pub­lish­ing hous­es are strength­en­ing their ethics com­mit­tees and peer review super­vi­sion, and they are col­lab­o­rat­ing with each oth­er to iden­ti­fy cer­tain prac­tices, such as dual sub­mis­sion, which is most often not accept­ed. Uni­ver­si­ties and the aca­d­e­m­ic world are also tak­ing action. In most research mas­ter’s pro­grammes, for exam­ple, train­ing in sci­en­tif­ic integri­ty has become compulsory.

In 2023, an open letter4 signed by some fifteen international researchers was sent to the management of the CNRS, criticising its handling of a misconduct case as too lenient. How do you view the desire expressed by some to crack down harder on fraudsters? 

This is not a course of action that I favour. In my opin­ion, it is the cur­rent eval­u­a­tion sys­tem that is at the root of the prob­lem, and it is this sys­tem that needs to be reformed to achieve last­ing results. A pure­ly quan­ti­ta­tive eval­u­a­tion is inher­ent­ly dan­ger­ous, as it is blind to the sci­en­tif­ic con­tent of articles.

How can the evaluation system be improved?

It seems impos­si­ble today to do away with met­rics alto­geth­er, but less impor­tance should be placed on them in indi­vid­ual and col­lec­tive research eval­u­a­tions. Oth­er types of con­tri­bu­tions could be val­ued: in com­put­er sci­ence, for exam­ple, the pro­duc­tion of code or data­bas­es. How­ev­er, I doubt that the sys­tem will change in the short term: the cur­rent indi­ca­tors are too easy to estab­lish and use, includ­ing for researchers. This brings us to a new para­dox: many researchers com­plain about the pres­sure they are under and the over­val­u­a­tion of sci­en­tif­ic arti­cles in the researcher eval­u­a­tion sys­tem, but it is easy and tempt­ing to use these quan­ti­ta­tive indi­ca­tors to jus­ti­fy the rel­e­vance of their work. It is so much sim­pler to tell a fun­der that you have been pub­lished in a pres­ti­gious jour­nal than to explain the ben­e­fits of your research for society…

So, should we resign ourselves to an increase in fraud?

Absolute­ly not. But we must act with patience and clar­i­ty. I am con­vinced that the human­i­ties have a cen­tral role to play in this quest for more hon­est sci­en­tif­ic lit­er­a­ture. That is why the ERC-Syn­er­gy NanoBub­bles project brings togeth­er researchers from var­i­ous dis­ci­plines: Raphaël Lévy (Paris Sor­bonne Nord Uni­ver­si­ty), an expert in nanobi­ol­o­gy; Cyrus Mody (Maas­tricht Uni­ver­si­ty), an expert in the his­to­ry of sci­ence; and Willem Halff­man (Rad­boud Uni­ver­si­ty), renowned for his work on the func­tion­ing of sci­en­tif­ic exper­tise and pol­i­cy. Togeth­er with the project mem­bers, com­put­er sci­en­tists, soci­ol­o­gists of sci­ence and philoso­phers, we are analysing devel­op­ments in the aca­d­e­m­ic world and pub­lish­ing hous­es to bet­ter under­stand the cur­rent sit­u­a­tion. This ana­lyt­i­cal work is essen­tial to iden­ti­fy vul­ner­a­bil­i­ties, pro­pose tar­get­ed pre­ven­tive and cor­rec­tive actions, and thus con­tribute to restor­ing last­ing con­fi­dence in the sci­en­tif­ic community.

Interview by Anne Orliac
1https://​nanobub​bles​.hypothe​ses​.org/
2https://www.irit.fr/~Guillaume.Cabanac/problematic-paper-screener
3R.A.K. Richardson,S.S. Hong,J.A. Byrne,T. Stoeger, & L.A.N. Ama­r­al, The enti­ties enabling sci­en­tif­ic fraud at scale are large, resilient, and grow­ing rapid­ly, Proc. Natl. Acad. Sci. U.S.A. 122 (32) e2420092122, https://​doi​.org/​1​0​.​1​0​7​3​/​p​n​a​s​.​2​4​2​0​0​92122 (2025).
4https://​deevy​bee​.blogspot​.com/​2​0​2​3​/​0​2​/​o​p​e​n​-​l​e​t​t​e​r​-​t​o​-​c​n​r​s​.html

Our world through the lens of science. Every week, in your inbox.

Get the newsletter