Home / Chroniques / Is generative AI a winning tool for research?
A man stands before a cosmic blackboard filled with complex mathematical equations and space formulas representing science astrophysics and universe exploration
π Digital π Society

Is generative AI a winning tool for research?

Chatelain_Arnault
Arnault Chatelain
PhD Student in Economics at CREST (CNRS/IP Paris)
Key takeaways
  • Scientists are currently testing methods of integrating large language models (LLMs) into research practices, which raises a number of questions.
  • LLMs are effective in detecting the tone of an article or comment, but less so in detecting rhetorical forms.
  • LLMs are most commonly used for text classification in social sciences, changing the way research is conducted.
  • There are risks associated with LLMs, such as the inability to replicate work, lack of data security, and the use of poor-quality data.
  • It is crucial to reflect on AI’s contributions to research through the lens of scientific method.

You have co-authored an article on the dangers of artificial intelligence (AI) in research. Why did you decide to carry out this work?

Arnault Chatelain. Today, sci­ent­ists are exper­i­ment­ing with large lan­guage mod­els (LLMs), which are an import­ant part of AI. Every­one is test­ing dif­fer­ent meth­ods to integ­rate them into research prac­tices, but many ques­tions remain. For cer­tain applic­a­tions, these LLMs are very effect­ive. For example, they are good at detect­ing the tone of an art­icle or com­ment. How­ever, they become much less effect­ive for more com­plic­ated tasks, such as detect­ing rhet­or­ic­al forms.

How are scientists using AI in their work?

I will only com­ment on the field I am famil­i­ar with, namely the social sci­ences, and more spe­cific­ally eco­nom­ics, soci­ology and polit­ic­al sci­ence. Sci­ent­ists mainly use LLMs to assist them and pro­cess large amounts of text. The first applic­a­tion is fairly gen­er­ic: reformat­ting texts, reor­gan­ising data tables, writ­ing com­puter code, etc. The use of Chat­G­PT-type chat­bots saves time, as many users out­side sci­entif­ic research have discovered.

The most com­mon use of LLMs in social sci­ences is text clas­si­fic­a­tion. Pre­vi­ously, the study of large amounts of text was done manu­al, a very time-con­sum­ing pro­cess. Today, it is pos­sible to manu­ally annot­ate a sample of text and then extend it to a cor­pus of texts using lan­guage mod­els. In our com­pu­ta­tion­al social sci­ence research team, we are try­ing to detect the use of rare rhet­or­ic­al forms in the press. We annot­ate around a hun­dred art­icles, and we can then extend our annota­tions to the entire press cor­pus. This gives us an over­view that would have been impossible to pro­duce without AI. In this sense, this tool increases our pos­sib­il­it­ies and changes the way we do research.

What dangers do you see in using AI for scientific research?

First of all, there is a risk con­cern­ing rep­lic­ab­il­ity. The rep­lic­ab­il­ity of res­ults is essen­tial to the sci­entif­ic meth­od. How­ever, pro­pri­et­ary mod­els [editor’s note: owned by private com­pan­ies] evolve and can dis­ap­pear overnight, as is the case with older ver­sions of ChatGPT3.5. This makes it impossible to rep­lic­ate the work. Anoth­er danger con­cerns data secur­ity. For sci­ent­ists work­ing with sens­it­ive data, such as health data, it is import­ant not to share data with private com­pan­ies. How­ever, the tempta­tion can be strong in the absence of eas­ily access­ible non-pro­pri­et­ary altern­at­ives. To avoid any risk, it would there­fore be prefer­able to use freely access­ible mod­els down­loaded loc­ally, but this requires adequate infra­struc­ture. Finally, I have observed that mod­els rely on large amounts of data, which can some­times be of poor qual­ity. We still have a lim­ited under­stand­ing of the type of bias that this can pro­duce with­in models.

What are the causes of these limitations?

With pro­pri­et­ary mod­els, the prob­lem is pre­cisely that we do not have con­trol over the mod­el we are using. Anoth­er issue stems from the fact that we do not fully under­stand how LLMs work, wheth­er they are pro­pri­et­ary or open source. Even when we have access to the code, we are unable to explain the res­ults obtained by AI. It has been demon­strated that by repeat­ing the same tasks on the same mod­el for sev­er­al months, the res­ults vary greatly and can­not be repro­duced1.

Fol­low­ing a series of art­icles claim­ing that gen­er­at­ive AI could respond to sur­veys in place of humans, my col­leagues have recently high­lighted sig­ni­fic­ant and unpre­dict­able vari­ab­il­ity in sim­u­la­tions of responses to an opin­ion ques­tion­naire2. They refer to this prob­lem as “machine bias”.

And regarding the danger of proprietary AI, isn’t it possible to get around the problem by working with open-source AI?

Of course, it is pos­sible to rep­lic­ate an exper­i­ment using open-source mod­els, although this does not solve the prob­lem of explain­ab­il­ity men­tioned above. We could, for example, con­sider using open-access mod­els by default and only using pro­pri­et­ary mod­els when abso­lutely neces­sary, as some have sug­ges­ted3. An art­icle pub­lished in 2024 high­lights the value of cre­at­ing an open-access infra­struc­ture for soci­olo­gic­al research to address this issue4. How­ever, this raises ques­tions about the pro­lif­er­a­tion of mod­els, the stor­age space required and the envir­on­ment­al cost. It also requires suit­able and eas­ily access­ible infrastructure.

Are there other safeguards for the proper use of AI in research?

There is a real need for bet­ter train­ing for sci­ent­ists: how AI mod­els work, their lim­it­a­tions, how to use them prop­erly, etc. I think sci­ent­ists need to be made aware of the dangers of AI, without demon­ising it, as it can be use­ful for their work.

Didn’t scientists ask themselves these questions when language models first appeared?

Ques­tions about the dangers of LLM for research, or the best prac­tices to imple­ment, are fairly recent. The first wave of work was marked by enthu­si­asm from the social sci­ence com­munity. That’s what promp­ted us to pub­lish our article.

Today, there is grow­ing interest in eval­u­at­ing lan­guage mod­els, but it is a com­plex issue. Until now, it has mainly been the com­puter sci­ence com­munity that has taken on the task of test­ing the per­form­ance of mod­els, par­tic­u­larly because it requires a cer­tain amount of tech­nic­al expert­ise. This year, I worked in a team of com­puter sci­ent­ists, lin­guists and soci­olo­gists to bet­ter incor­por­ate the needs of social sci­ences into AI eval­u­ation cri­ter­ia5. This involves pay­ing closer atten­tion to the nature of the test data used. Does good per­form­ance on tweets guar­an­tee sim­il­ar per­form­ance on news art­icles or speeches?

As for the rep­lic­ab­il­ity of stud­ies, this is a crisis that was already present in the social sci­ences. AI is rein­for­cing the dis­cus­sions around this topic.

Should we stop or continue to use AI in research?

I think it is essen­tial to reflect on the con­tri­bu­tions of AI. Is it of real bene­fit to research? This requires reli­able, sci­en­tific­ally based meas­ure­ment of the resi­li­ence of lan­guage mod­els. Anoth­er pre­requis­ite is the estab­lish­ment of a rig­or­ous frame­work for the use of AI in research. Finally, we need to ask ourselves how depend­ent the sci­entif­ic com­munity is on private act­ors. This car­ries many risks, par­tic­u­larly for research strategy. If sci­ent­ists focus on work where AI can help them, this will influ­ence the dir­ec­tion of their research.

Interview by Anaïs Marechal
1https://​arthurspirl​ing​.org/​d​o​c​u​m​e​n​t​s​/​B​a​r​r​i​e​P​a​l​m​e​r​S​p​i​r​l​i​n​g​_​T​r​u​s​t​M​e​B​r​o.pdf
2https://​journ​als​.sage​pub​.com/​d​o​i​/​1​0​.​1​1​7​7​/​0​0​4​9​1​2​4​1​2​5​1​3​30582
3https://www.nature.com/articles/s43588-023–00585‑1
4https://​www​.pnas​.org/​d​o​i​/​1​0​.​1​0​7​3​/​p​n​a​s​.​2​3​1​4​0​21121
5https://​pan​ta​gru​el​.imag​.fr/

Support accurate information rooted in the scientific method.

Donate