Home / Chroniques / Is generative AI a winning tool for research?
A man stands before a cosmic blackboard filled with complex mathematical equations and space formulas representing science astrophysics and universe exploration
π Digital π Society

Is generative AI a winning tool for research?

Chatelain_Arnault
Arnault Chatelain
PhD Student in Economics at CREST (CNRS/IP Paris)
Key takeaways
  • Scientists are currently testing methods of integrating large language models (LLMs) into research practices, which raises a number of questions.
  • LLMs are effective in detecting the tone of an article or comment, but less so in detecting rhetorical forms.
  • LLMs are most commonly used for text classification in social sciences, changing the way research is conducted.
  • There are risks associated with LLMs, such as the inability to replicate work, lack of data security, and the use of poor-quality data.
  • It is crucial to reflect on AI’s contributions to research through the lens of scientific method.

You have co-authored an article on the dangers of artificial intelligence (AI) in research. Why did you decide to carry out this work?

Arnault Chate­lain. Today, sci­en­tists are exper­i­ment­ing with large lan­guage mod­els (LLMs), which are an impor­tant part of AI. Every­one is test­ing dif­fer­ent meth­ods to inte­grate them into research prac­tices, but many ques­tions remain. For cer­tain appli­ca­tions, these LLMs are very effec­tive. For exam­ple, they are good at detect­ing the tone of an arti­cle or com­ment. How­ev­er, they become much less effec­tive for more com­pli­cat­ed tasks, such as detect­ing rhetor­i­cal forms.

How are scientists using AI in their work?

I will only com­ment on the field I am famil­iar with, name­ly the social sci­ences, and more specif­i­cal­ly eco­nom­ics, soci­ol­o­gy and polit­i­cal sci­ence. Sci­en­tists main­ly use LLMs to assist them and process large amounts of text. The first appli­ca­tion is fair­ly gener­ic: refor­mat­ting texts, reor­gan­is­ing data tables, writ­ing com­put­er code, etc. The use of Chat­G­PT-type chat­bots saves time, as many users out­side sci­en­tif­ic research have discovered.

The most com­mon use of LLMs in social sci­ences is text clas­si­fi­ca­tion. Pre­vi­ous­ly, the study of large amounts of text was done man­u­al, a very time-con­sum­ing process. Today, it is pos­si­ble to man­u­al­ly anno­tate a sam­ple of text and then extend it to a cor­pus of texts using lan­guage mod­els. In our com­pu­ta­tion­al social sci­ence research team, we are try­ing to detect the use of rare rhetor­i­cal forms in the press. We anno­tate around a hun­dred arti­cles, and we can then extend our anno­ta­tions to the entire press cor­pus. This gives us an overview that would have been impos­si­ble to pro­duce with­out AI. In this sense, this tool increas­es our pos­si­bil­i­ties and changes the way we do research.

What dangers do you see in using AI for scientific research?

First of all, there is a risk con­cern­ing replic­a­bil­i­ty. The replic­a­bil­i­ty of results is essen­tial to the sci­en­tif­ic method. How­ev­er, pro­pri­etary mod­els [editor’s note: owned by pri­vate com­pa­nies] evolve and can dis­ap­pear overnight, as is the case with old­er ver­sions of ChatGPT3.5. This makes it impos­si­ble to repli­cate the work. Anoth­er dan­ger con­cerns data secu­ri­ty. For sci­en­tists work­ing with sen­si­tive data, such as health data, it is impor­tant not to share data with pri­vate com­pa­nies. How­ev­er, the temp­ta­tion can be strong in the absence of eas­i­ly acces­si­ble non-pro­pri­etary alter­na­tives. To avoid any risk, it would there­fore be prefer­able to use freely acces­si­ble mod­els down­loaded local­ly, but this requires ade­quate infra­struc­ture. Final­ly, I have observed that mod­els rely on large amounts of data, which can some­times be of poor qual­i­ty. We still have a lim­it­ed under­stand­ing of the type of bias that this can pro­duce with­in models.

What are the causes of these limitations?

With pro­pri­etary mod­els, the prob­lem is pre­cise­ly that we do not have con­trol over the mod­el we are using. Anoth­er issue stems from the fact that we do not ful­ly under­stand how LLMs work, whether they are pro­pri­etary or open source. Even when we have access to the code, we are unable to explain the results obtained by AI. It has been demon­strat­ed that by repeat­ing the same tasks on the same mod­el for sev­er­al months, the results vary great­ly and can­not be repro­duced1.

Fol­low­ing a series of arti­cles claim­ing that gen­er­a­tive AI could respond to sur­veys in place of humans, my col­leagues have recent­ly high­light­ed sig­nif­i­cant and unpre­dictable vari­abil­i­ty in sim­u­la­tions of respons­es to an opin­ion ques­tion­naire2. They refer to this prob­lem as “machine bias”.

And regarding the danger of proprietary AI, isn’t it possible to get around the problem by working with open-source AI?

Of course, it is pos­si­ble to repli­cate an exper­i­ment using open-source mod­els, although this does not solve the prob­lem of explain­abil­i­ty men­tioned above. We could, for exam­ple, con­sid­er using open-access mod­els by default and only using pro­pri­etary mod­els when absolute­ly nec­es­sary, as some have sug­gest­ed3. An arti­cle pub­lished in 2024 high­lights the val­ue of cre­at­ing an open-access infra­struc­ture for soci­o­log­i­cal research to address this issue4. How­ev­er, this rais­es ques­tions about the pro­lif­er­a­tion of mod­els, the stor­age space required and the envi­ron­men­tal cost. It also requires suit­able and eas­i­ly acces­si­ble infrastructure.

Are there other safeguards for the proper use of AI in research?

There is a real need for bet­ter train­ing for sci­en­tists: how AI mod­els work, their lim­i­ta­tions, how to use them prop­er­ly, etc. I think sci­en­tists need to be made aware of the dan­gers of AI, with­out demon­is­ing it, as it can be use­ful for their work.

Didn’t scientists ask themselves these questions when language models first appeared?

Ques­tions about the dan­gers of LLM for research, or the best prac­tices to imple­ment, are fair­ly recent. The first wave of work was marked by enthu­si­asm from the social sci­ence com­mu­ni­ty. That’s what prompt­ed us to pub­lish our article.

Today, there is grow­ing inter­est in eval­u­at­ing lan­guage mod­els, but it is a com­plex issue. Until now, it has main­ly been the com­put­er sci­ence com­mu­ni­ty that has tak­en on the task of test­ing the per­for­mance of mod­els, par­tic­u­lar­ly because it requires a cer­tain amount of tech­ni­cal exper­tise. This year, I worked in a team of com­put­er sci­en­tists, lin­guists and soci­ol­o­gists to bet­ter incor­po­rate the needs of social sci­ences into AI eval­u­a­tion cri­te­ria5. This involves pay­ing clos­er atten­tion to the nature of the test data used. Does good per­for­mance on tweets guar­an­tee sim­i­lar per­for­mance on news arti­cles or speeches?

As for the replic­a­bil­i­ty of stud­ies, this is a cri­sis that was already present in the social sci­ences. AI is rein­forc­ing the dis­cus­sions around this topic.

Should we stop or continue to use AI in research?

I think it is essen­tial to reflect on the con­tri­bu­tions of AI. Is it of real ben­e­fit to research? This requires reli­able, sci­en­tif­i­cal­ly based mea­sure­ment of the resilience of lan­guage mod­els. Anoth­er pre­req­ui­site is the estab­lish­ment of a rig­or­ous frame­work for the use of AI in research. Final­ly, we need to ask our­selves how depen­dent the sci­en­tif­ic com­mu­ni­ty is on pri­vate actors. This car­ries many risks, par­tic­u­lar­ly for research strat­e­gy. If sci­en­tists focus on work where AI can help them, this will influ­ence the direc­tion of their research.

Interview by Anaïs Marechal
1https://​arthur​spir​ling​.org/​d​o​c​u​m​e​n​t​s​/​B​a​r​r​i​e​P​a​l​m​e​r​S​p​i​r​l​i​n​g​_​T​r​u​s​t​M​e​B​r​o.pdf
2https://​jour​nals​.sagepub​.com/​d​o​i​/​1​0​.​1​1​7​7​/​0​0​4​9​1​2​4​1​2​5​1​3​30582
3https://www.nature.com/articles/s43588-023–00585‑1
4https://​www​.pnas​.org/​d​o​i​/​1​0​.​1​0​7​3​/​p​n​a​s​.​2​3​1​4​0​21121
5https://​pan​ta​gru​el​.imag​.fr/

Our world through the lens of science. Every week, in your inbox.

Get the newsletter