BIAIS IA
π Digital π Science and technology
What are the next challenges for AI?

Artificial intelligence : a tool for domination or emancipation ?

with Lê Nguyên Hoang, Co-founder and President of Tournesol.app, Victor Berger, Post-doctoral researcher at CEA Saclay and Giada Pistilli, PhD student at Sorbonne University affiliated with the CNRS Science, Norms, Democracy laboratory
On January 17th, 2023 |
5 min reading time
Lê Nguyên Hoang
Lê Nguyên Hoang
Co-founder and President of Tournesol.app
BERGER Victor
Victor Berger
Post-doctoral researcher at CEA Saclay
PISTILLI Giada
Giada Pistilli
PhD student at Sorbonne University affiliated with the CNRS Science, Norms, Democracy laboratory
Key takeaways
  • There are three ways to teach artificial intelligence (AI): supervised learning, unsupervised learning, and reinforcement learning.
  • Machine learning algorithms can spot patterns so the slightest hidden bias in a dataset can therefore be exploited and amplified.
  • Generalising from past experience in AI can be problematic because algorithms use historical data to answer present problems.
  • AI is also a field with a great deal of power: ethical issues such as the use of data can emerge.
  • Communities could take ownership of AI, using it as a true participatory tool for emancipation.

Before tack­ling the issue of AI bias, it is impor­tant to unders­tand how a machine lear­ning algo­rithm works, but also what it means. Vic­tor Ber­ger, a post-doc­to­ral fel­low in arti­fi­cial intel­li­gence and machine lear­ning at CEA-List, explains, “the basic assump­tion of most machine lear­ning algo­rithms is that we have data that is sup­po­sed­ly a sta­tis­ti­cal repre­sen­ta­tion of the pro­blem we want to solve.”

Three main ways of learning 

The sim­plest – tech­ni­cal­ly spea­king – most com­mon way to teach a machine lear­ning AI is cal­led super­vi­sed lear­ning. “For example, if you have a data­base full of ani­mal pic­tures, a super­vi­sed algo­rithm will alrea­dy know that a pic­ture repre­sents a dog, a cat, a chi­cken, etc., and it will know that for this input it should give a spe­ci­fic res­ponse in out­put. A clas­sic example of this type of algo­rithm is lan­guage trans­la­tors,” explains Vic­tor Berger.

The second cate­go­ry of algo­rithms, unsu­per­vi­sed lear­ning, is gene­ral­ly used when we do not have the solu­tion to a pro­blem : “to conti­nue with the example of ani­mals, an unsu­per­vi­sed lear­ning algo­rithm will contain a data­base with the same pho­tos as the pre­vious one, without having pre­cise ins­truc­tions on how it should react in out­put with res­pect to a given input. Its aim is gene­ral­ly to iden­ti­fy sta­tis­ti­cal pat­terns within the data­set it is given for cate­go­ri­sa­tion (or clus­te­ring) purposes.”

The whole pro­blem lies in the data sets used to super­vise the algorithms.

The third cate­go­ry of algo­rithms is rein­for­ce­ment lear­ning : “In the first two cate­go­ries, the way the algo­rithm is coded allows it to direct itself and know how to improve. This com­ponent is absent in rein­for­ce­ment lear­ning, where the algo­rithm just knows whe­ther it has com­ple­ted its task cor­rect­ly or not. It has no ins­truc­tions about which direc­tions to take to become bet­ter. In the end, it is the envi­ron­ment and its reac­tion to the algorithm’s deci­sion making that will act as a guide,” Vic­tor Ber­ger explains.

In all three cases, the pro­blem lies in the data sets used to super­vise the algo­rithms. Vic­tor Ber­ger reminds us that “machine lear­ning algo­rithms enable pat­terns to be iden­ti­fied. The­re­fore, the sligh­test bias hid­den in a data set can bias the entire algo­rithm, which will find the bia­sed pat­tern then exploit and ampli­fy it. 

Generalisation of data

For Lê Nguyên Hoang, a doc­tor in mathe­ma­tics and co-foun­der of Tour­ne­sol, who popu­la­rises the sub­ject of arti­fi­cial intel­li­gence, the hypo­the­sis of the gene­ra­li­sa­tion of data is omni­present in the field of machine lear­ning : “Ques­tions rela­ting to the qua­li­ty of data are lar­ge­ly under­va­lued. Whe­ther in the research world or in indus­try, it is the desi­gn of algo­rithms that takes centre stage. But very few people ask them­selves whe­ther gene­ra­li­sing the past by trai­ning algo­rithms with his­to­ri­cal data­bases that we don’t look at cri­ti­cal­ly is real­ly a viable pro­ject for society.”

To bet­ter unders­tand how this can mani­fest itself, Vic­tor Ber­ger refers to a spe­ci­fic anec­dote cir­cu­la­ting in the machine lear­ning com­mu­ni­ty : “To avoid gen­der bias, a com­pa­ny using AI to sort CVs exclu­ded infor­ma­tion such as name and pho­tos. But they rea­li­sed that it had retai­ned foot­ball as a rele­vant focus.” As care­ful as the com­pa­ny was, it pro­vi­ded its his­to­ri­cal data without anti­ci­pa­ting a pat­tern : the most recrui­ted CVs in the past – those of men – were more like­ly to include the inter­est foot­ball. Far from com­ba­ting the gen­der bias, the algo­rithm nur­tu­red it. There are two solu­tions for dea­ling with this type of pro­blem, “either humans are tas­ked with buil­ding more qua­li­ta­tive data­bases – but this requires a colos­sal amount of work – or algo­rithms are tas­ked with eli­mi­na­ting the biases alrea­dy iden­ti­fied,” explains Vic­tor Berger. 

But this does not solve all pro­blems. “If we take the example of content mode­ra­tion, the label­ling of data depends on the concept of free­dom of expres­sion that we defend, on what we consi­der to be or not to be a call to hatred or dan­ge­rous false infor­ma­tion. The­re­fore, ques­tions that do not have clear ans­wers and where there will be disa­gree­ments. So, if the pro­blem is not just a tech­ni­cal one, the same goes for the solu­tions,” says Lê Nguyên Hoang.

Feedback loops

There are also ques­tions about the feed­back loops which algo­rithms can cause. “What you have to bear in mind is that a machine lear­ning algo­rithm is always pres­crip­tive, because its aim is to achieve a spe­ci­fic objec­tive : maxi­mi­sing pre­sence on a plat­form, pro­fit, click-through rate, etc.” points out Lê Nguyên Hoang.

Ima­gine an algo­rithm used by a com­mu­ni­ty’s police force to pre­dict in which neigh­bou­rhood there will be the most crime and aggres­sion. Vic­tor Ber­ger argues that “what this algo­rithm is going to do is make a pre­dic­tion based on his­to­ri­cal police data that iden­ti­fies the neigh­bou­rhoods where the most people have been arres­ted.” Here again, we fall back on the same flaw : the risk of gene­ra­li­sing – or even ampli­fying – the past. Indeed, this pre­dic­tion is not only des­crip­tive, but it also leads to deci­sion-making : increa­sing the num­ber of police offi­cers, increa­sing video sur­veillance, etc. Deci­sions that may lead to the rein­for­ce­ment of a par­ti­cu­lar area of the city. Deci­sions that can lead to the rein­for­ce­ment of an alrea­dy tense climate.

The phe­no­me­na of radi­ca­li­sa­tion, sec­ta­rian move­ments and conspi­ra­cy circles can be amplified.

Simi­lar­ly, on social media and enter­tain­ment plat­forms, recom­men­da­tion algo­rithms are based on the user’s pre­vious choices. Their objec­tive is gene­ral­ly to keep the user’s atten­tion for as long as pos­sible. As a result, the phe­no­me­na of radi­ca­li­sa­tion, sec­ta­rian move­ments and conspi­ra­cy theo­ries can be ampli­fied. Lê Nguyên Hoang is wor­king on sol­ving this pro­blem with the help of an algo­rithm, cal­led Tour­ne­sol, whose data­base is built up in a col­la­bo­ra­tive man­ner1

Questions of power 

Arti­fi­cial intel­li­gence is the­re­fore not just a field of scien­ti­fic stu­dy or a tech­no­lo­gi­cal field of appli­ca­tion. It also has a great deal of power at stake. “It is very impor­tant to ana­lyse and list the various social and ethi­cal pro­blems that can arise from these algo­rithms, from their trai­ning to their desi­gn and deploy­ment,” warns Gia­da Pis­tilli, a phi­lo­so­phy resear­cher and senior ethi­cist at Hug­ging Face.

What exact­ly are these pro­blems ? The phi­lo­so­phy resear­cher explains that they can be found at all levels of the AI deve­lop­ment chain. “There can be ethi­cal pro­blems that emerge as soon as a model is trai­ned because of the data issue : can the data lead to ste­reo­ty­ping ? What are the conse­quences of the absence of cer­tain data ? Has the data used, such as pri­vate images and intel­lec­tual pro­per­ty, been sub­ject to consent before being used as a trai­ning data­set for the model?”

But this is far from being the only pro­ble­ma­tic link in the chain. “During deve­lop­ment and deploy­ment, ques­tions of gover­nance arise. Who owns the model, who desi­gns it and for what pur­pose ? There is also the ques­tion of the need for cer­tain models in the light of cli­mate change. Run­ning these models consumes a great deal of ener­gy. In fact, this high­lights the fact that only power­ful com­pa­nies have the resources to use them,” warns the researcher.

We can make AI a real empo­werment tool that com­mu­ni­ties could take owner­ship of.

For­tu­na­te­ly, the pic­ture is not all black. Arti­fi­cial intel­li­gence can be used as a tool for empo­werment. Gia­da Pis­tilli is a mem­ber of Big­Science, a col­la­bo­ra­tive pro­ject invol­ving thou­sands of aca­de­mics that aims to deve­lop an open access lan­guage model. Accor­ding to her, such pro­jects can make AI robust­ly bene­fi­cial. “By deve­lo­ping AI that is spe­cia­li­sed on a single task, it can be made more audi­table, par­ti­ci­pa­to­ry, and tai­lo­red to the com­mu­ni­ty that will use it. By edu­ca­ting users about these new tech­no­lo­gies and inte­gra­ting them into the pro­ject of buil­ding data­bases, we can make AI a real empo­werment tool that com­mu­ni­ties can take owner­ship of.”

 Will we be able to rise to these mul­tiple chal­lenges ? The ques­tion remains.

Julien Hernandez
1https://​www​.futu​ra​-sciences​.com/​t​e​c​h​/​a​c​t​u​a​l​i​t​e​s​/​i​n​t​e​l​l​i​g​e​n​c​e​-​a​r​t​i​f​i​c​i​e​l​l​e​-​t​o​u​r​n​e​s​o​l​-​a​l​g​o​r​i​t​h​m​e​-​u​t​i​l​i​t​e​-​p​u​b​l​i​q​u​e​-​b​e​s​o​i​n​-​v​o​u​s​-​8​7301/

Support accurate information rooted in the scientific method.

Donate