π Digital

Behind every algorithm: the unseen workers powering AI

Antonio Casilli

Sociologist, Professor at Télécom Paris (IP Paris), and Research Associate at the Minderoo Centre for Democracy and Technology, University of Cambridge

Key takeaways

AI training relies on humans; these are not just experts, but rather many underpaid and vulnerable workers who keep systems running on a day-to-day basis.
These workers are increasingly young, mainly aged between 18 and 35 (sometimes up to 44 or older), and are overqualified and over-educated for the work they do.
Data workers are highly exposed to psychosocial risks: many work in isolation from home, whilst others work in highly restrictive and vulnerable conditions.
Some data workers find themselves working in their own countries on tools that could subsequently be deployed by foreign governments against them.

How would you define the concept of invisible labour behind AI?

Antonio Casilli. First, an important clarification: this work is not “invisible”. If one studies this phenomenon as I do, by meeting workers in countries such as Madagascar, Kenya or Brazil, one discovers that their work is hidden but not invisible. So, the definition I can give is that of work involving the preparation, verification and sometimes imitation of artificial intelligence. That is to say, everything to do with training, quality control and model alignment.

This work is essential for the existence of current artificial intelligence models, as they are largely based on machine learning systems. And “learning” means that people have to teach the machine. And despite what one might think, this training is not only provided by experts paid hundreds of thousands of euros a year. It is also, and above all, people on very low wages and in rather precarious situations who keep the machines running. These hidden workers have been there for about twenty years, ever since we started developing data-driven and machine learning-based artificial intelligence.

Have you noticed any changes or developments in recent years, and since the advent of the ChatGPT era?

Since we began our research in 2016 as part of the DiPLab, we have indeed observed changes. And I can tell you that these changes are not where one might have expected them. We would have expected them to be more evident in terms of task complexity, given that we are dealing with models that are becoming increasingly powerful and large in scale. When you have to manage up to a trillion parameters, you’d imagine the model is so complex that the people tasked with training and verifying it must also be top specialists – not just “data workers”, but genuine “AI tutors”. The type of are profiles that are frequently advertised on platforms like LinkedIn. But this is not actually the case.

Currently, in the post-ChatGPT era (i.e. post-2022), a few specialists have certainly been recruited, but they are very much in the minority. Their visibility is merely a communication strategy employed by the tech giants and artificial intelligence producers, aimed at their investors. Most of this workforce still consists of the people we encounter in our investigations, who are poorly paid and carry out fragmented work. What is sometimes referred to as “piecework”.

Many of them suffer from severe post-traumatic stress disorder, and we see this quite consistently. Some people can even be left broken.

Since 2019–2020, we have observed that these workers are increasingly younger. Internationally, even in Europe, these workers are now between 18 and 35 years old, although some may be as old as 44, or even approaching retirement age. Many of these workers are young people, with a disproportionately high level of education relative to the tasks they perform, their pay, and the nature of the work.

So, these people are overqualified and over-educated for the work they do. The AI market, in this respect, is a labour market that does not function properly, because it is not effectively allocating the best workers, the youngest or the most highly qualified, to the jobs that pay the most. Moreover, these data workers are not even offered permanent positions. This workforce, recruited via platforms, is paid on a piecework basis, without a proper employment contract. Very often, these are simply standard terms and conditions of use, which must be accepted online, and which do not guarantee workers’ rights. If they do have a contract, it is a fixed-term contract lasting between one and three months. Moreover, they often reside in countries where worker protection is so weak that such a contract remains a pipe dream.

In your work, you also address the many psychosocial risks to which data workers are exposed. Could you tell us more about this?

Data workers are a group that is highly exposed to psychosocial risks. A significant proportion of them experience isolation, for example when working from home. Others work in highly restrictive environments, which leave them vulnerable. To give you an example, content and chatbot moderators, who are also data workers, work in secure offices where they are subject to very close supervision yet are unable to communicate with one another. Paradoxically, this creates a situation of both intense pressure and severe isolation at the same time.

Many of them suffer from severe post-traumatic stress disorder, and we see this quite consistently. Some people can even be left broken. I remember taking the metro in a European city with someone, a former moderator, who had spent months watching videos of people throwing themselves under trains. This person was so deeply traumatised that they had to stand far away from the edge of the platform, avoid the trains, unable to control themselves or bear being near the metro.

Have you, however, noticed the emergence of counterforces or countertrends amongst these workers? And if so, what are they?

Yes, there are definitely countertrends, which are not a recent phenomenon, nor are they spontaneous, in the sense that it is not the system that is self-correcting or stabilising itself. These counter-trends are linked to social conflicts, disputes, and trade union struggles in certain sectors, and to the fact that some manage to attract the attention of the public and decision-makers. This is the case with content moderators, whose cause is now widely known.

I can give you a concrete example from Kenya, where our colleagues and the trade unionists we work with on the ground are extremely well organised, particularly because Kenya is a country with a strong trade union tradition and a relatively progressive constitution. There is both a movement and a highly developed civil society. And although these individuals are sometimes migrants themselves from other countries, such as Somalia, Nigeria, Ethiopia, or South Africa, they are embedded in wider networks, and these networks are giving rise to a proliferation of trade unions, associations and other alliances. The name may change, the label may change, but there are a huge number of initiatives of this kind at present.

In Europe, other movements are also taking shape. I am thinking, for example, of Germany, where a feminist trade union called Superrr is working to mobilise public opinion and decision-makers. In the United States, too, “cross-collar” movements (i.e. cross-sectoral unions bringing together white-collar and blue-collar workers, such as the Alphabet Workers Union) are organising, even though industrialists are very powerful there.

What have you observed regarding the militarisation of digital technologies and applied AI?

Firstly, the scope of AI is enormous, as this is a dual-use technology: it has both civilian and military applications, which can obviously be used simultaneously. This is, in fact, par for the course, since the structure of funding and forms of research support have incorporated the military sphere from the outset. This extends even into the sovereign sphere, as we are dealing with issues of sovereignty in the strictly nationalist sense of the term. It is a question of a country’s independence and resilience, of having its own infrastructure, but also of projecting an aggressive and, alas, warmongering image and ideology.

Today, the situation has become far more complex, particularly with the overlap and intersection between artificial intelligence training and military objectives. Some data workers thus find themselves working in their own countries on tools that could subsequently be deployed by foreign governments against them. Very specific cases have been documented in Syria, Palestine, and several African countries.

We have also observed a sort of continuum between humanitarian initiatives and military implications in active theatres of war. The activities of major AI players have been documented across numerous theatres of operation, via facial recognition applications, for example. In Ukraine, many artificial intelligence giants have been identified as suppliers of tools and data. I am referring here to Google, Palantir Technologies, Clearview AI, Microsoft, SpaceX, Anthropic and OpenAI. The activities of these companies are problematic, in that they combine labour-intensive AI production with applications that are increasingly being used for military purposes.

Personally, as a conscientious objector, I am outraged by this situation. As an academic too, especially as it makes our work even more difficult. Because in a geopolitical context characterised by tensions, not only with adversaries but also with historical allies, the situation becomes particularly complicated for us researchers.

In this context, carrying out the basic work of our research becomes complex. The simple act of correctly documenting the nature of military intervention and the misuse of technology by political and economic actors whose interests align with armed conflicts becomes a sensitive, even dangerous, undertaking.

Behind every algorithm: the unseen workers powering AI

How would you define the concept of invisible labour behind AI?

Have you noticed any changes or developments in recent years, and since the advent of the ChatGPT era?

In your work, you also address the many psychosocial risks to which data workers are exposed. Could you tell us more about this?

Have you, however, noticed the emergence of counterforces or countertrends amongst these workers? And if so, what are they?

What have you observed regarding the militarisation of digital technologies and applied AI?

Interview by Marie Varasson

More on this topic

5 breakthroughs made possible by quantum technologies

Gaia-X: the bid for a sovereign European cloud

Why the European Central Bank wants to digitize the euro

A digital twin of the lungs: what benefits for medicine of the future?

Will we live on in the form of virtual avatars?

Our selection of braincamps

Cybersecurity, AI, finance : quantum's next frontiers

Facing cyber threats: public and industrial strategies

Video games, Esports and AI: an anatomy of today's digital markets