Home / Chroniques / Behind every algorithm: the unseen workers powering AI
Généré par l'IA / Generated using AI
π Digital

Behind every algorithm: the unseen workers powering AI

Antonio Casilli
Antonio Casilli
Sociologist, Professor at Télécom Paris (IP Paris), and Research Associate at the Minderoo Centre for Democracy and Technology, University of Cambridge
Key takeaways
  • AI training relies on humans; these are not just experts, but rather many underpaid and vulnerable workers who keep systems running on a day-to-day basis.
  • These workers are increasingly young, mainly aged between 18 and 35 (sometimes up to 44 or older), and are overqualified and over-educated for the work they do.
  • Data workers are highly exposed to psychosocial risks: many work in isolation from home, whilst others work in highly restrictive and vulnerable conditions.
  • Some data workers find themselves working in their own countries on tools that could subsequently be deployed by foreign governments against them.

How would you define the concept of invisible labour behind AI?

Ant­o­nio Casilli. First, an import­ant cla­ri­fic­a­tion: this work is not “invis­ible”. If one stud­ies this phe­nomen­on as I do, by meet­ing work­ers in coun­tries such as Mad­a­gas­car, Kenya or Brazil, one dis­cov­ers that their work is hid­den but not invis­ible. So, the defin­i­tion I can give is that of work involving the pre­par­a­tion, veri­fic­a­tion and some­times imit­a­tion of arti­fi­cial intel­li­gence. That is to say, everything to do with train­ing, qual­ity con­trol and mod­el alignment.

This work is essen­tial for the exist­ence of cur­rent arti­fi­cial intel­li­gence mod­els, as they are largely based on machine learn­ing sys­tems. And “learn­ing” means that people have to teach the machine. And des­pite what one might think, this train­ing is not only provided by experts paid hun­dreds of thou­sands of euros a year. It is also, and above all, people on very low wages and in rather pre­cari­ous situ­ations who keep the machines run­ning. These hid­den work­ers have been there for about twenty years, ever since we star­ted devel­op­ing data-driv­en and machine learn­ing-based arti­fi­cial intelligence.

Have you noticed any changes or developments in recent years, and since the advent of the ChatGPT era?

Since we began our research in 2016 as part of the DiPLab, we have indeed observed changes. And I can tell you that these changes are not where one might have expec­ted them. We would have expec­ted them to be more evid­ent in terms of task com­plex­ity, giv­en that we are deal­ing with mod­els that are becom­ing increas­ingly power­ful and large in scale. When you have to man­age up to a tril­lion para­met­ers, you’d ima­gine the mod­el is so com­plex that the people tasked with train­ing and veri­fy­ing it must also be top spe­cial­ists – not just “data work­ers”, but genu­ine “AI tutors”. The type of are pro­files that are fre­quently advert­ised on plat­forms like Linked­In. But this is not actu­ally the case.

Cur­rently, in the post-Chat­G­PT era (i.e. post-2022), a few spe­cial­ists have cer­tainly been recruited, but they are very much in the minor­ity. Their vis­ib­il­ity is merely a com­mu­nic­a­tion strategy employed by the tech giants and arti­fi­cial intel­li­gence pro­du­cers, aimed at their investors. Most of this work­force still con­sists of the people we encounter in our invest­ig­a­tions, who are poorly paid and carry out frag­men­ted work. What is some­times referred to as “piece­work”.

Many of them suf­fer from severe post-trau­mat­ic stress dis­order, and we see this quite con­sist­ently. Some people can even be left broken.

Since 2019–2020, we have observed that these work­ers are increas­ingly young­er. Inter­na­tion­ally, even in Europe, these work­ers are now between 18 and 35 years old, although some may be as old as 44, or even approach­ing retire­ment age. Many of these work­ers are young people, with a dis­pro­por­tion­ately high level of edu­ca­tion rel­at­ive to the tasks they per­form, their pay, and the nature of the work.

So, these people are over­qual­i­fied and over-edu­cated for the work they do. The AI mar­ket, in this respect, is a labour mar­ket that does not func­tion prop­erly, because it is not effect­ively alloc­at­ing the best work­ers, the young­est or the most highly qual­i­fied, to the jobs that pay the most. Moreover, these data work­ers are not even offered per­man­ent pos­i­tions. This work­force, recruited via plat­forms, is paid on a piece­work basis, without a prop­er employ­ment con­tract. Very often, these are simply stand­ard terms and con­di­tions of use, which must be accep­ted online, and which do not guar­an­tee work­ers’ rights. If they do have a con­tract, it is a fixed-term con­tract last­ing between one and three months. Moreover, they often reside in coun­tries where work­er pro­tec­tion is so weak that such a con­tract remains a pipe dream.

In your work, you also address the many psychosocial risks to which data workers are exposed. Could you tell us more about this?

Data work­ers are a group that is highly exposed to psychoso­cial risks. A sig­ni­fic­ant pro­por­tion of them exper­i­ence isol­a­tion, for example when work­ing from home. Oth­ers work in highly restrict­ive envir­on­ments, which leave them vul­ner­able. To give you an example, con­tent and chat­bot mod­er­at­ors, who are also data work­ers, work in secure offices where they are sub­ject to very close super­vi­sion yet are unable to com­mu­nic­ate with one anoth­er. Para­dox­ic­ally, this cre­ates a situ­ation of both intense pres­sure and severe isol­a­tion at the same time.

Many of them suf­fer from severe post-trau­mat­ic stress dis­order, and we see this quite con­sist­ently. Some people can even be left broken. I remem­ber tak­ing the metro in a European city with someone, a former mod­er­at­or, who had spent months watch­ing videos of people throw­ing them­selves under trains. This per­son was so deeply trau­mat­ised that they had to stand far away from the edge of the plat­form, avoid the trains, unable to con­trol them­selves or bear being near the metro.

Have you, however, noticed the emergence of counterforces or countertrends amongst these workers? And if so, what are they?

Yes, there are def­in­itely coun­ter­trends, which are not a recent phe­nomen­on, nor are they spon­tan­eous, in the sense that it is not the sys­tem that is self-cor­rect­ing or sta­bil­ising itself. These counter-trends are linked to social con­flicts, dis­putes, and trade uni­on struggles in cer­tain sec­tors, and to the fact that some man­age to attract the atten­tion of the pub­lic and decision-makers. This is the case with con­tent mod­er­at­ors, whose cause is now widely known.

I can give you a con­crete example from Kenya, where our col­leagues and the trade uni­on­ists we work with on the ground are extremely well organ­ised, par­tic­u­larly because Kenya is a coun­try with a strong trade uni­on tra­di­tion and a rel­at­ively pro­gress­ive con­sti­tu­tion. There is both a move­ment and a highly developed civil soci­ety. And although these indi­vidu­als are some­times migrants them­selves from oth­er coun­tries, such as Somalia, Niger­ia, Ethiopia, or South Africa, they are embed­ded in wider net­works, and these net­works are giv­ing rise to a pro­lif­er­a­tion of trade uni­ons, asso­ci­ations and oth­er alli­ances. The name may change, the label may change, but there are a huge num­ber of ini­ti­at­ives of this kind at present.

In Europe, oth­er move­ments are also tak­ing shape. I am think­ing, for example, of Ger­many, where a fem­in­ist trade uni­on called Superrr is work­ing to mobil­ise pub­lic opin­ion and decision-makers. In the United States, too, “cross-col­lar” move­ments (i.e. cross-sec­tor­al uni­ons bring­ing togeth­er white-col­lar and blue-col­lar work­ers, such as the Alpha­bet Work­ers Uni­on) are organ­ising, even though indus­tri­al­ists are very power­ful there.

What have you observed regarding the militarisation of digital technologies and applied AI?

Firstly, the scope of AI is enorm­ous, as this is a dual-use tech­no­logy: it has both civil­ian and mil­it­ary applic­a­tions, which can obvi­ously be used sim­ul­tan­eously. This is, in fact, par for the course, since the struc­ture of fund­ing and forms of research sup­port have incor­por­ated the mil­it­ary sphere from the out­set. This extends even into the sov­er­eign sphere, as we are deal­ing with issues of sov­er­eignty in the strictly nation­al­ist sense of the term. It is a ques­tion of a country’s inde­pend­ence and resi­li­ence, of hav­ing its own infra­struc­ture, but also of pro­ject­ing an aggress­ive and, alas, war­mon­ger­ing image and ideology.

Today, the situ­ation has become far more com­plex, par­tic­u­larly with the over­lap and inter­sec­tion between arti­fi­cial intel­li­gence train­ing and mil­it­ary object­ives. Some data work­ers thus find them­selves work­ing in their own coun­tries on tools that could sub­sequently be deployed by for­eign gov­ern­ments against them. Very spe­cif­ic cases have been doc­u­mented in Syr­ia, Palestine, and sev­er­al Afric­an countries.

We have also observed a sort of con­tinuum between human­it­ari­an ini­ti­at­ives and mil­it­ary implic­a­tions in act­ive theatres of war. The activ­it­ies of major AI play­ers have been doc­u­mented across numer­ous theatres of oper­a­tion, via facial recog­ni­tion applic­a­tions, for example. In Ukraine, many arti­fi­cial intel­li­gence giants have been iden­ti­fied as sup­pli­ers of tools and data. I am refer­ring here to Google, Palantir Tech­no­lo­gies, Clear­view AI, Microsoft, SpaceX, Anthrop­ic and OpenAI. The activ­it­ies of these com­pan­ies are prob­lem­at­ic, in that they com­bine labour-intens­ive AI pro­duc­tion with applic­a­tions that are increas­ingly being used for mil­it­ary purposes.

Per­son­ally, as a con­scien­tious object­or, I am out­raged by this situ­ation. As an aca­dem­ic too, espe­cially as it makes our work even more dif­fi­cult. Because in a geo­pol­it­ic­al con­text char­ac­ter­ised by ten­sions, not only with adversar­ies but also with his­tor­ic­al allies, the situ­ation becomes par­tic­u­larly com­plic­ated for us researchers.

In this con­text, car­ry­ing out the basic work of our research becomes com­plex. The simple act of cor­rectly doc­u­ment­ing the nature of mil­it­ary inter­ven­tion and the mis­use of tech­no­logy by polit­ic­al and eco­nom­ic act­ors whose interests align with armed con­flicts becomes a sens­it­ive, even dan­ger­ous, undertaking.

Interview by Marie Varasson

Our world through the lens of science. Every week, in your inbox.

Get the newsletter