Home / Columns / Why we need ‘explainable’ AI
2,15
π Digital π Science and technology

Why we need ‘explainable’ AI

Véronique Steyer
Véronique Steyer
Associate Professor in Innovation Management at École Polytechnique (IP Paris)
Louis Vuarin
Louis Vuarin
Postdoctoral fellow in SES department of Telecom Paris (I3-CNRS, IP Paris)
Sihem BenMahmoud-Jouini
Sihem Ben Mahmoud-Jouini
Associate Professor of Innovation Management at HEC and researcher at GREGHEC

The more AI algo­rithms spread, the more their opac­i­ty rais­es ques­tions. In the Face­book files, Frances Hau­gen, a spe­cial­ist in algo­rith­mic rank­ings, took inter­nal doc­u­ments with her when she left the com­pa­ny and sent them to the Amer­i­can reg­u­la­to­ry agency and Con­gress. In doing so, she claims to reveal an algo­rith­mic sys­tem that has become so com­plex that it is beyond the con­trol of the very engi­neers who devel­oped it. While the role of the social net­work in the prop­a­ga­tion of dan­ger­ous con­tent and fake news is wor­ry­ing, the inabil­i­ty of its engi­neers, who are among the best in the world in their field, to under­stand how their cre­ation works is ques­tion­able. The case once again puts the spot­light on the prob­lems posed by the (in)explainability of these algo­rithms that pro­lif­er­ate in many industries.

Understanding “opaque algorithms”

The opac­i­ty of algo­rithms does not con­cern all AIs. Rather, it is espe­cial­ly char­ac­ter­is­tic of so-called learn­ing algo­rithms known as machine learn­ing, par­tic­u­lar­ly deep learn­ing algo­rithms such as neur­al net­works. How­ev­er, giv­en the grow­ing impor­tance of these algo­rithms in our soci­eties, it is not sur­pris­ing that the con­cern for their explain­abil­i­ty – our abil­i­ty to under­stand how they work – is mobil­is­ing reg­u­la­tors and ethics committees.

How can we trust the sys­tems to which we are going to entrust our lives? For exam­ple, AI may soon be dri­ving our vehi­cles or diag­nos­ing a seri­ous ill­ness. And, more broad­ly, the future of our soci­eties through the infor­ma­tion that cir­cu­lates, the polit­i­cal ideas that are prop­a­gat­ed, the dis­crim­i­na­tion that lurks? How can we attribute respon­si­bil­i­ty for an acci­dent or a harm­ful side effect if it is not pos­si­ble to open the ‘black box’ that is an AI algorithm?

With­in the organ­i­sa­tions that devel­op and use these opaque algo­rithms, this quest for explain­abil­i­ty basi­cal­ly inter­twines two issues, which it is use­ful to dis­tin­guish. On the one hand, it is a ques­tion of explain­ing AI so that we may under­stand it (i.e. inter­pretabil­i­ty), with the aim of improv­ing the algo­rithm, per­fect­ing its robust­ness and fore­see­ing its flaws. On the oth­er hand, we are look­ing to explain AI in order to iden­ti­fy who is account­able for the func­tion­ing of the sys­tem to the many stake­hold­ers, both exter­nal (reg­u­la­tors, users, part­ners) and inter­nal to the organ­i­sa­tion (man­agers, project lead­ers). This account­abil­i­ty is cru­cial so that respon­si­bil­i­ty for the effects – good or not so good – of these sys­tems can be attrib­uted and assumed by the right person(s).

Opening the “black box”

A whole field of research is there­fore devel­op­ing today to try to meet the tech­ni­cal chal­lenge of open­ing the ‘black box’ of these algo­rithms: Explain­able Arti­fi­cial Intel­li­gence (XAI). But is it enough to make these sys­tems trans­par­ent to meet the dual chal­lenge of explain­abil­i­ty: inter­pretabil­i­ty and account­abil­i­ty? In our dis­ci­pline, man­age­ment sci­ences, work is being done on the lim­its of trans­paren­cy. We stud­ied the way in which indi­vid­u­als with­in organ­i­sa­tions take hold1 of the explana­to­ry tools that have been devel­oped2, such as heat maps, as well as per­for­mance curves, often asso­ci­at­ed with algo­rithms. We have sought to under­stand the way organ­i­sa­tions use them.

Indeed, these two types of tools seem to be in the process of being stan­dard­ised to respond to the pres­sure from reg­u­la­tors to explain algo­rithms; as illus­trat­ed by arti­cle L4001‑3 of the pub­lic health code: « III… The design­ers of an algo­rith­mic treat­ment… ensure that its oper­a­tion is explain­able for users » (law no. 2021–1017 of 2 August 2021). Our ini­tial results high­light the ten­sions that arise with­in organ­i­sa­tions around these tools. Let us take two typ­i­cal examples.

Example #1: Explaining reasoning behind a particular decision

The first exam­ple illus­trates the ambi­gu­i­ties of these tools when by busi­ness experts who are not famil­iar with how they work. Fig­ure 1, tak­en from Whebe et al (2021), shows heat maps used to explain the oper­a­tion of an algo­rithm devel­oped to detect Covid-19 from lung x‑rays. This sys­tem per­forms very well in tests, com­pa­ra­ble to, and often bet­ter than, that of a human radi­ol­o­gist. Study­ing these images, we note that in cas­es A, B, and C, these heat maps tell us that the AI was pro­nounced from an area cor­re­spond­ing to the lungs. In case D, how­ev­er, the illu­mi­nat­ed area is locat­ed around the patien­t’s collarbone.

Fig­ure 1: Heat maps – here from Deep­COVID-XR, an AI algo­rithm trained to detect covid19 on chest X‑rays. Source: Wehbe, R. M., Sheng, J., Dut­ta, S., Chai, S., Dravid, A., Barutcu, S., … & Kat­sagge­los, A. K. (2021). Deep­COVID-XR: An Arti­fi­cial Intel­li­gence Algo­rithm to Detect COVID-19 on Chest Radi­ographs Trained and Test­ed on a Large US Clin­i­cal Dataset. Radi­ol­o­gy, 299 (1), p.E 173

This obser­va­tion high­lights a prob­lem often found in our study of the use of these tools in organ­i­sa­tions: designed by devel­op­ers to spot inad­e­quate AI rea­son­ing, these tools are often, in prac­tice, used to make judge­ments about the valid­i­ty of a par­tic­u­lar case. How­ev­er, these heat maps are not designed for this pur­pose, and for an end user it is not the same thing to say: “this AI’s diag­no­sis seems accu­rate” or “the process by which it goes about deter­min­ing its diag­no­sis seems valid.”

Two prob­lems are then high­light­ed: first­ly, there is the ques­tion of train­ing users, not only in AI, but also in the work­ings of these explana­to­ry tools. And more fun­da­men­tal­ly, this exam­ple rais­es ques­tions about the use of these tools in assign­ing respon­si­bil­i­ty for a deci­sion: by dis­sem­i­nat­ing them, there is a risk of lead­ing users to take respon­si­bil­i­ty on the basis of tools that being used for deci­sions oth­er than their intend­ed pur­pose. In oth­er words, these tools, cre­at­ed for inter­pretabil­i­ty, risk being used in sit­u­a­tions where account­abil­i­ty is at stake.

Example #2: Explaining or evaluating performance 

A sec­ond exam­ple illus­trates effects of organ­i­sa­tion­al games played around explana­to­ry tools such as per­for­mance curves. For exam­ple, a devel­op­er work­ing in the Inter­net adver­tis­ing sec­tor told us that, in his field, if the per­for­mance of the algo­rithm did not reach more than 95%, cus­tomers would not buy the solu­tion. This stan­dard had con­crete effects on the inno­va­tion process: it led her com­pa­ny, under pres­sure from the sales force, to choose to com­mer­cialise and focus its R&D efforts on a very effi­cient but not very explain­able algo­rithm. In doing so, she aban­doned the devel­op­ment of anoth­er, more explain­able, but not as pow­er­ful algorithm.

The prob­lem was that the algo­rithm they chose to com­mer­cialise was not capa­ble of han­dling rad­i­cal changes in con­sumer habits, such as dur­ing Black Fri­day. This caused the com­pa­ny not to rec­om­mend cus­tomers to engage in any mar­ket­ing cam­paigns at this time, as it was dif­fi­cult to recog­nise that the algorithm’s per­for­mance was falling. Again, the tool is mis­used: instead of being used as one ele­ment among oth­ers to make the choice of the best algo­rithm for a giv­en activ­i­ty, these per­for­mance curves were used by buy­ers to jus­ti­fy the choice of a provider against an indus­try stan­dard, In oth­er words: for accountability.

The “man-machine-organisation” interaction

For us, researchers in man­age­ment sci­ences, inter­est­ed in the process­es of deci­sion-mak­ing and inno­va­tion, the role of tools to sup­port them, and the way in which actors legit­imise the choic­es they make with­in their organ­i­sa­tions, these two exam­ples show that it is not enough to look at these com­pu­ta­tion­al tools in iso­la­tion. They call for an approach to algo­rithms and their human, organ­i­sa­tion­al and insti­tu­tion­al con­texts as assem­blages. Hold­ing such an assem­blage account­able then requires not just see­ing inside one of its com­po­nents as the algo­rithm – a dif­fi­cult and nec­es­sary task – but under­stand­ing how the assem­blage func­tions as a sys­tem, com­posed of machines, humans, and organ­i­sa­tion. And, con­se­quent­ly, to be wary of a too rapid stan­dard­i­s­a­tion of explic­a­bil­i­ty tools, which would result in man-machine-organ­i­sa­tion­al game sys­tems, in which the imper­a­tive to be account­able would replace that of inter­pret­ing, mul­ti­ply­ing black box­es of which explic­a­bil­i­ty would only be a façade.

1A heat map visu­al­ly indi­cates which part of the image (which pix­els) was used by the algo­rithm to pre­scribe a deci­sion and in par­tic­u­lar to clas­si­fy the image into a cer­tain cat­e­go­ry (see fig­ure 1).
2A per­for­mance curve is a 2D graph rep­re­sent­ing the math­e­mat­i­cal accu­ra­cy of the pre­dic­tion as a func­tion of the sen­si­tiv­i­ty of a sig­nif­i­cant para­me­ter (such as the num­ber of images pro­vid­ed to the AI to per­form its analy­sis, or the time in mil­lisec­onds allowed to per­form the cal­cu­la­tions, etc.). This curve is then com­pared to points indi­cat­ing human per­for­mance.

Contributors

Véronique Steyer

Véronique Steyer

Associate Professor in Innovation Management at École Polytechnique (IP Paris)

Véronique Steyer holds a Master's degree in Management from ESCP Europe, a PhD in Management Sciences from the University of Paris Ouest and a PhD from ESCP Europe. Véronique Steyer was a visiting researcher at the University of Warwick (UK), and also has experience as a management consultant. Her research focuses on how individual actions and interactions between actors shape organisational phenomena and, more broadly, contribute to shaping the society in which we live. Her work focuses on sensemaking in situations of change: transformations induced by the introduction of new technologies such as AI, perception of a crisis or a risk (pandemic, ecological transition).

Louis Vuarin

Louis Vuarin

Postdoctoral fellow in SES department of Telecom Paris (I3-CNRS, IP Paris)

A former risk manager and graduate of the ESCP Master in Management, Louis Vuarin holds a PhD in Management Sciences from ESCP since 2020. He was a visiting researcher at SCORE in Stockholm and a post-doctoral fellow at CRG (Ecole Polytechnique). His current research focuses on epistemic processes within organisations, in particular on the question of the explicability of artificial intelligence (XAI) and its organisational dimension, as well as on the phenomena of resistance to innovation in the context of 5G.

Sihem BenMahmoud-Jouini

Sihem Ben Mahmoud-Jouini

Associate Professor of Innovation Management at HEC and researcher at GREGHEC

Sihem BenMahmoud-Jouini is the Scientific Director of the X-HEC Entrepreneurs MS, the Innovation Design Project (IDP) and Entrepreneurship & Innovation (Executive MBA). Her research focuses on the management of disruptive innovations in established companies and the organisation of the exploration of new fields. She is also interested in the role of design in innovation and organisational transformation. Her research has been conducted at Assistance Publique des Hôpitaux de Paris, Vinci, Thales, Valeo, Air Liquide, Essilor, Danone and Orange, where she held the Chair.