π Digital π Science and technology
What are the next challenges for AI?

Artificial intelligence: a tool for domination or emancipation?

Lê Nguyên Hoang, Co-founder and President of, Victor Berger, Post-doctoral researcher at CEA Saclay and Giada Pistilli, PhD student at Sorbonne University affiliated with the CNRS Science, Norms, Democracy laboratory
On January 17th, 2023 |
5 min reading time
Lê Nguyên Hoang
Lê Nguyên Hoang
Co-founder and President of
Victor Berger
Post-doctoral researcher at CEA Saclay
Giada Pistilli
PhD student at Sorbonne University affiliated with the CNRS Science, Norms, Democracy laboratory
Key takeaways
  • There are three ways to teach artificial intelligence (AI): supervised learning, unsupervised learning, and reinforcement learning.
  • Machine learning algorithms can spot patterns so the slightest hidden bias in a dataset can therefore be exploited and amplified.
  • Generalising from past experience in AI can be problematic because algorithms use historical data to answer present problems.
  • AI is also a field with a great deal of power: ethical issues such as the use of data can emerge.
  • Communities could take ownership of AI, using it as a true participatory tool for emancipation.

Before tack­ling the issue of AI bias, it is impor­tant to under­stand how a machine learn­ing algo­rithm works, but also what it means. Vic­tor Berg­er, a post-doc­tor­al fel­low in arti­fi­cial intel­li­gence and machine learn­ing at CEA-List, explains, “the basic assump­tion of most machine learn­ing algo­rithms is that we have data that is sup­pos­ed­ly a sta­tis­ti­cal rep­re­sen­ta­tion of the prob­lem we want to solve.”

Three main ways of learning 

The sim­plest – tech­ni­cal­ly speak­ing – most com­mon way to teach a machine learn­ing AI is called super­vised learn­ing. “For exam­ple, if you have a data­base full of ani­mal pic­tures, a super­vised algo­rithm will already know that a pic­ture rep­re­sents a dog, a cat, a chick­en, etc., and it will know that for this input it should give a spe­cif­ic response in out­put. A clas­sic exam­ple of this type of algo­rithm is lan­guage trans­la­tors,” explains Vic­tor Berger.

The sec­ond cat­e­go­ry of algo­rithms, unsu­per­vised learn­ing, is gen­er­al­ly used when we do not have the solu­tion to a prob­lem: “to con­tin­ue with the exam­ple of ani­mals, an unsu­per­vised learn­ing algo­rithm will con­tain a data­base with the same pho­tos as the pre­vi­ous one, with­out hav­ing pre­cise instruc­tions on how it should react in out­put with respect to a giv­en input. Its aim is gen­er­al­ly to iden­ti­fy sta­tis­ti­cal pat­terns with­in the dataset it is giv­en for cat­e­gori­sa­tion (or clus­ter­ing) purposes.”

The whole prob­lem lies in the data sets used to super­vise the algorithms.

The third cat­e­go­ry of algo­rithms is rein­force­ment learn­ing: “In the first two cat­e­gories, the way the algo­rithm is cod­ed allows it to direct itself and know how to improve. This com­po­nent is absent in rein­force­ment learn­ing, where the algo­rithm just knows whether it has com­plet­ed its task cor­rect­ly or not. It has no instruc­tions about which direc­tions to take to become bet­ter. In the end, it is the envi­ron­ment and its reac­tion to the algorithm’s deci­sion mak­ing that will act as a guide,” Vic­tor Berg­er explains.

In all three cas­es, the prob­lem lies in the data sets used to super­vise the algo­rithms. Vic­tor Berg­er reminds us that “machine learn­ing algo­rithms enable pat­terns to be iden­ti­fied. There­fore, the slight­est bias hid­den in a data set can bias the entire algo­rithm, which will find the biased pat­tern then exploit and ampli­fy it. 

Generalisation of data

For Lê Nguyên Hoang, a doc­tor in math­e­mat­ics and co-founder of Tour­nesol, who pop­u­laris­es the sub­ject of arti­fi­cial intel­li­gence, the hypoth­e­sis of the gen­er­al­i­sa­tion of data is omnipresent in the field of machine learn­ing: “Ques­tions relat­ing to the qual­i­ty of data are large­ly under­val­ued. Whether in the research world or in indus­try, it is the design of algo­rithms that takes cen­tre stage. But very few peo­ple ask them­selves whether gen­er­al­is­ing the past by train­ing algo­rithms with his­tor­i­cal data­bas­es that we don’t look at crit­i­cal­ly is real­ly a viable project for society.”

To bet­ter under­stand how this can man­i­fest itself, Vic­tor Berg­er refers to a spe­cif­ic anec­dote cir­cu­lat­ing in the machine learn­ing com­mu­ni­ty: “To avoid gen­der bias, a com­pa­ny using AI to sort CVs exclud­ed infor­ma­tion such as name and pho­tos. But they realised that it had retained foot­ball as a rel­e­vant focus.” As care­ful as the com­pa­ny was, it pro­vid­ed its his­tor­i­cal data with­out antic­i­pat­ing a pat­tern: the most recruit­ed CVs in the past – those of men – were more like­ly to include the inter­est foot­ball. Far from com­bat­ing the gen­der bias, the algo­rithm nur­tured it. There are two solu­tions for deal­ing with this type of prob­lem, “either humans are tasked with build­ing more qual­i­ta­tive data­bas­es – but this requires a colos­sal amount of work – or algo­rithms are tasked with elim­i­nat­ing the bias­es already iden­ti­fied,” explains Vic­tor Berger. 

But this does not solve all prob­lems. “If we take the exam­ple of con­tent mod­er­a­tion, the labelling of data depends on the con­cept of free­dom of expres­sion that we defend, on what we con­sid­er to be or not to be a call to hatred or dan­ger­ous false infor­ma­tion. There­fore, ques­tions that do not have clear answers and where there will be dis­agree­ments. So, if the prob­lem is not just a tech­ni­cal one, the same goes for the solu­tions,” says Lê Nguyên Hoang.

Feedback loops

There are also ques­tions about the feed­back loops which algo­rithms can cause. “What you have to bear in mind is that a machine learn­ing algo­rithm is always pre­scrip­tive, because its aim is to achieve a spe­cif­ic objec­tive: max­imis­ing pres­ence on a plat­form, prof­it, click-through rate, etc.” points out Lê Nguyên Hoang.

Imag­ine an algo­rithm used by a com­mu­ni­ty’s police force to pre­dict in which neigh­bour­hood there will be the most crime and aggres­sion. Vic­tor Berg­er argues that “what this algo­rithm is going to do is make a pre­dic­tion based on his­tor­i­cal police data that iden­ti­fies the neigh­bour­hoods where the most peo­ple have been arrest­ed.” Here again, we fall back on the same flaw: the risk of gen­er­al­is­ing – or even ampli­fy­ing – the past. Indeed, this pre­dic­tion is not only descrip­tive, but it also leads to deci­sion-mak­ing: increas­ing the num­ber of police offi­cers, increas­ing video sur­veil­lance, etc. Deci­sions that may lead to the rein­force­ment of a par­tic­u­lar area of the city. Deci­sions that can lead to the rein­force­ment of an already tense climate.

The phe­nom­e­na of rad­i­cal­i­sa­tion, sec­tar­i­an move­ments and con­spir­a­cy cir­cles can be amplified.

Sim­i­lar­ly, on social media and enter­tain­ment plat­forms, rec­om­men­da­tion algo­rithms are based on the user’s pre­vi­ous choic­es. Their objec­tive is gen­er­al­ly to keep the user’s atten­tion for as long as pos­si­ble. As a result, the phe­nom­e­na of rad­i­cal­i­sa­tion, sec­tar­i­an move­ments and con­spir­a­cy the­o­ries can be ampli­fied. Lê Nguyên Hoang is work­ing on solv­ing this prob­lem with the help of an algo­rithm, called Tour­nesol, whose data­base is built up in a col­lab­o­ra­tive man­ner1

Questions of power 

Arti­fi­cial intel­li­gence is there­fore not just a field of sci­en­tif­ic study or a tech­no­log­i­cal field of appli­ca­tion. It also has a great deal of pow­er at stake. “It is very impor­tant to analyse and list the var­i­ous social and eth­i­cal prob­lems that can arise from these algo­rithms, from their train­ing to their design and deploy­ment,” warns Gia­da Pis­til­li, a phi­los­o­phy researcher and senior ethi­cist at Hug­ging Face.

What exact­ly are these prob­lems? The phi­los­o­phy researcher explains that they can be found at all lev­els of the AI devel­op­ment chain. “There can be eth­i­cal prob­lems that emerge as soon as a mod­el is trained because of the data issue: can the data lead to stereo­typ­ing? What are the con­se­quences of the absence of cer­tain data? Has the data used, such as pri­vate images and intel­lec­tu­al prop­er­ty, been sub­ject to con­sent before being used as a train­ing dataset for the model?”

But this is far from being the only prob­lem­at­ic link in the chain. “Dur­ing devel­op­ment and deploy­ment, ques­tions of gov­er­nance arise. Who owns the mod­el, who designs it and for what pur­pose? There is also the ques­tion of the need for cer­tain mod­els in the light of cli­mate change. Run­ning these mod­els con­sumes a great deal of ener­gy. In fact, this high­lights the fact that only pow­er­ful com­pa­nies have the resources to use them,” warns the researcher.

We can make AI a real empow­er­ment tool that com­mu­ni­ties could take own­er­ship of.

For­tu­nate­ly, the pic­ture is not all black. Arti­fi­cial intel­li­gence can be used as a tool for empow­er­ment. Gia­da Pis­til­li is a mem­ber of Big­Science, a col­lab­o­ra­tive project involv­ing thou­sands of aca­d­e­mics that aims to devel­op an open access lan­guage mod­el. Accord­ing to her, such projects can make AI robust­ly ben­e­fi­cial. “By devel­op­ing AI that is spe­cialised on a sin­gle task, it can be made more auditable, par­tic­i­pa­to­ry, and tai­lored to the com­mu­ni­ty that will use it. By edu­cat­ing users about these new tech­nolo­gies and inte­grat­ing them into the project of build­ing data­bas­es, we can make AI a real empow­er­ment tool that com­mu­ni­ties can take own­er­ship of.”

 Will we be able to rise to these mul­ti­ple chal­lenges? The ques­tion remains.

Julien Hernandez

Our world explained with science. Every week, in your inbox.

Get the newsletter