Home / Chroniques / Is generative AI a winning tool for research?
A man stands before a cosmic blackboard filled with complex mathematical equations and space formulas representing science astrophysics and universe exploration
π Digital π Society

Is generative AI a winning tool for research ?

Chatelain_Arnault
Arnault Chatelain
PhD Student in Economics at CREST (CNRS/IP Paris)
Key takeaways
  • Scientists are currently testing methods of integrating large language models (LLMs) into research practices, which raises a number of questions.
  • LLMs are effective in detecting the tone of an article or comment, but less so in detecting rhetorical forms.
  • LLMs are most commonly used for text classification in social sciences, changing the way research is conducted.
  • There are risks associated with LLMs, such as the inability to replicate work, lack of data security, and the use of poor-quality data.
  • It is crucial to reflect on AI’s contributions to research through the lens of scientific method.

You have co-authored an article on the dangers of artificial intelligence (AI) in research. Why did you decide to carry out this work ?

Arnault Cha­te­lain. Today, scien­tists are expe­ri­men­ting with large lan­guage models (LLMs), which are an impor­tant part of AI. Eve­ryone is tes­ting dif­ferent methods to inte­grate them into research prac­tices, but many ques­tions remain. For cer­tain appli­ca­tions, these LLMs are very effec­tive. For example, they are good at detec­ting the tone of an article or com­ment. Howe­ver, they become much less effec­tive for more com­pli­ca­ted tasks, such as detec­ting rhe­to­ri­cal forms.

How are scientists using AI in their work ?

I will only com­ment on the field I am fami­liar with, name­ly the social sciences, and more spe­ci­fi­cal­ly eco­no­mics, socio­lo­gy and poli­ti­cal science. Scien­tists main­ly use LLMs to assist them and pro­cess large amounts of text. The first appli­ca­tion is fair­ly gene­ric : refor­mat­ting texts, reor­ga­ni­sing data tables, wri­ting com­pu­ter code, etc. The use of ChatGPT-type chat­bots saves time, as many users out­side scien­ti­fic research have discovered.

The most com­mon use of LLMs in social sciences is text clas­si­fi­ca­tion. Pre­vious­ly, the stu­dy of large amounts of text was done manual, a very time-consu­ming pro­cess. Today, it is pos­sible to manual­ly anno­tate a sample of text and then extend it to a cor­pus of texts using lan­guage models. In our com­pu­ta­tio­nal social science research team, we are trying to detect the use of rare rhe­to­ri­cal forms in the press. We anno­tate around a hun­dred articles, and we can then extend our anno­ta­tions to the entire press cor­pus. This gives us an over­view that would have been impos­sible to pro­duce without AI. In this sense, this tool increases our pos­si­bi­li­ties and changes the way we do research.

What dangers do you see in using AI for scientific research ?

First of all, there is a risk concer­ning repli­ca­bi­li­ty. The repli­ca­bi­li­ty of results is essen­tial to the scien­ti­fic method. Howe­ver, pro­prie­ta­ry models [editor’s note : owned by pri­vate com­pa­nies] evolve and can disap­pear over­night, as is the case with older ver­sions of ChatGPT3.5. This makes it impos­sible to repli­cate the work. Ano­ther dan­ger concerns data secu­ri­ty. For scien­tists wor­king with sen­si­tive data, such as health data, it is impor­tant not to share data with pri­vate com­pa­nies. Howe­ver, the temp­ta­tion can be strong in the absence of easi­ly acces­sible non-pro­prie­ta­ry alter­na­tives. To avoid any risk, it would the­re­fore be pre­fe­rable to use free­ly acces­sible models down­loa­ded local­ly, but this requires ade­quate infra­struc­ture. Final­ly, I have obser­ved that models rely on large amounts of data, which can some­times be of poor qua­li­ty. We still have a limi­ted unders­tan­ding of the type of bias that this can pro­duce within models.

What are the causes of these limitations ?

With pro­prie­ta­ry models, the pro­blem is pre­ci­se­ly that we do not have control over the model we are using. Ano­ther issue stems from the fact that we do not ful­ly unders­tand how LLMs work, whe­ther they are pro­prie­ta­ry or open source. Even when we have access to the code, we are unable to explain the results obtai­ned by AI. It has been demons­tra­ted that by repea­ting the same tasks on the same model for seve­ral months, the results vary great­ly and can­not be repro­du­ced1.

Fol­lo­wing a series of articles clai­ming that gene­ra­tive AI could respond to sur­veys in place of humans, my col­leagues have recent­ly high­ligh­ted signi­fi­cant and unpre­dic­table varia­bi­li­ty in simu­la­tions of res­ponses to an opi­nion ques­tion­naire2. They refer to this pro­blem as “machine bias”.

And regarding the danger of proprietary AI, isn’t it possible to get around the problem by working with open-source AI ?

Of course, it is pos­sible to repli­cate an expe­riment using open-source models, although this does not solve the pro­blem of explai­na­bi­li­ty men­tio­ned above. We could, for example, consi­der using open-access models by default and only using pro­prie­ta­ry models when abso­lu­te­ly neces­sa­ry, as some have sug­ges­ted3. An article publi­shed in 2024 high­lights the value of crea­ting an open-access infra­struc­ture for socio­lo­gi­cal research to address this issue4. Howe­ver, this raises ques­tions about the pro­li­fe­ra­tion of models, the sto­rage space requi­red and the envi­ron­men­tal cost. It also requires sui­table and easi­ly acces­sible infrastructure.

Are there other safeguards for the proper use of AI in research ?

There is a real need for bet­ter trai­ning for scien­tists : how AI models work, their limi­ta­tions, how to use them pro­per­ly, etc. I think scien­tists need to be made aware of the dan­gers of AI, without demo­ni­sing it, as it can be use­ful for their work.

Didn’t scientists ask themselves these questions when language models first appeared ?

Ques­tions about the dan­gers of LLM for research, or the best prac­tices to imple­ment, are fair­ly recent. The first wave of work was mar­ked by enthu­siasm from the social science com­mu­ni­ty. That’s what promp­ted us to publish our article.

Today, there is gro­wing inter­est in eva­lua­ting lan­guage models, but it is a com­plex issue. Until now, it has main­ly been the com­pu­ter science com­mu­ni­ty that has taken on the task of tes­ting the per­for­mance of models, par­ti­cu­lar­ly because it requires a cer­tain amount of tech­ni­cal exper­tise. This year, I wor­ked in a team of com­pu­ter scien­tists, lin­guists and socio­lo­gists to bet­ter incor­po­rate the needs of social sciences into AI eva­lua­tion cri­te­ria5. This involves paying clo­ser atten­tion to the nature of the test data used. Does good per­for­mance on tweets gua­ran­tee simi­lar per­for­mance on news articles or speeches ?

As for the repli­ca­bi­li­ty of stu­dies, this is a cri­sis that was alrea­dy present in the social sciences. AI is rein­for­cing the dis­cus­sions around this topic.

Should we stop or continue to use AI in research ?

I think it is essen­tial to reflect on the contri­bu­tions of AI. Is it of real bene­fit to research ? This requires reliable, scien­ti­fi­cal­ly based mea­su­re­ment of the resi­lience of lan­guage models. Ano­ther pre­re­qui­site is the esta­blish­ment of a rigo­rous fra­me­work for the use of AI in research. Final­ly, we need to ask our­selves how dependent the scien­ti­fic com­mu­ni­ty is on pri­vate actors. This car­ries many risks, par­ti­cu­lar­ly for research stra­te­gy. If scien­tists focus on work where AI can help them, this will influence the direc­tion of their research.

Interview by Anaïs Marechal
1https://​arthurs​pir​ling​.org/​d​o​c​u​m​e​n​t​s​/​B​a​r​r​i​e​P​a​l​m​e​r​S​p​i​r​l​i​n​g​_​T​r​u​s​t​M​e​B​r​o.pdf
2https://​jour​nals​.sage​pub​.com/​d​o​i​/​1​0​.​1​1​7​7​/​0​0​4​9​1​2​4​1​2​5​1​3​30582
3https://www.nature.com/articles/s43588-023–00585‑1
4https://​www​.pnas​.org/​d​o​i​/​1​0​.​1​0​7​3​/​p​n​a​s​.​2​3​1​4​0​21121
5https://​pan​ta​gruel​.imag​.fr/

Support accurate information rooted in the scientific method.

Donate