Home / Chroniques / AI, a weapon against tax fraud
AdobeStock_601667456
π Economics π Science and technology

AI, a weapon against tax fraud

Christophe Gaie
Christophe Gaie
Head of the Engineering and Digital Innovation Division at the Prime Minister's Office
Key takeaways
  • Tax fraud is a major issue, accounting for between 4% and 15% of the tax gap in various OECD countries.
  • In France, there is a desire to step up the fight against fraud, in particular by using artificial intelligence tools.
  • The CISIRH has developed an operational and theoretical framework for comparing different fraud detection algorithms around the world.
  • To combat tax fraud effectively, AI and algorithms will not be enough; this fight must be part of a collective and human approach.

Detec­ting tax fraud is a major chal­lenge, par­ti­cu­lar­ly in the cur­rent context of high govern­ment defi­cits. Fraud accounts for a signi­fi­cant pro­por­tion of the tax gap, esti­ma­ted at bet­ween 4% and 15% of the sums owed in various OECD coun­tries. In France, for example, VAT fraud alone is esti­ma­ted at €20–25bn1. As a result, the Cour des Comptes has publi­shed nume­rous stu­dies high­ligh­ting the impor­tance of step­ping up the fight against fraud2. In France, tax fraud is moni­to­red by the DGFiP, which uses seve­ral arti­fi­cial intel­li­gence tools that have pro­du­ced very pro­mi­sing results.

With this in mind, Chris­tophe Gaie set up a pro­ject group with stu­dents from Cen­tra­le­Su­pé­lec. Toge­ther, they car­ried out a research stu­dy aimed at put­ting in place an ope­ra­tio­nal fra­me­work (metho­do­lo­gy, algo­rith­mic approach, com­pu­ter code, simu­la­tion data, etc.) and sha­ring it with all those invol­ved in the fight against fraud3.

What was the aim of this study ?

This pro­ject is a conti­nua­tion of the more theo­re­ti­cal research that has hel­ped to define and arti­cu­late the various concepts, issues and direc­tions in the field4. It extends and imple­ments this theo­re­ti­cal aspect and pro­poses an ope­ra­tio­nal fra­me­work that enables algo­rithms deve­lo­ped by resear­chers from all over the world to be deve­lo­ped and compared.

As opti­mi­sa­tion is not a pro­hi­bi­ted action, our work has focu­sed on fraud in the sense of irre­gu­la­ri­ties. We have also concen­tra­ted our efforts on detec­ting fraud per­pe­tra­ted by indi­vi­duals, as fraud by legal enti­ties can be dealt with elsewhere.

Where does your database for this study come from ?

A tax file can contain a lot of data rela­ting to the indi­vi­dual : fami­ly situa­tion, income, assets, etc. Whe­ther in the labo­ra­to­ry or when stu­dying real data, it is not always pos­sible to have all the data avai­lable. We the­re­fore crea­ted a fic­ti­tious data­base based on a set of pre-selec­ted data : socio-pro­fes­sio­nal cate­go­ry, income, expen­di­ture, amount of pro­per­ty. This data­base can of course be added to at a later date.

For per­fect­ly valid rea­sons of confi­den­tia­li­ty of per­so­nal data, the DGFiP can­not make data avai­lable for the detec­tion of tax fraud. As a result, each resear­cher builds up his or her own data­base inde­pen­dent­ly, which has pro­ved detri­men­tal for seve­ral rea­sons. For example, each resear­cher has to build his or her own data­base, which is time-consu­ming and requires the resear­cher to appro­priate concepts such as income, assets, etc., in order to detect fraud. But also, resear­chers’ algo­rithms are not neces­sa­ri­ly com­pa­rable with each other, refe­rence data­bases being a clas­sic approach in the field of digi­tal research (data­base of refe­rences, tele­com­mu­ni­ca­tion signals or images…).

How does this AI detect fraud ?

The AI is based on a tax file model that allows the selec­tion of files to be che­cked accor­ding to confi­gu­rable cri­te­ria. Based on our know­ledge of the majo­ri­ty of fraud cases, we have defi­ned a taxpayer’s like­li­hood of fraud accor­ding to dif­ferent typologies :

  • High expenses and/or high assets in rela­tion to income,
  • Low expenses and/or assets com­pa­red to income,
  • High wealth com­pa­red to simi­lar people in the same socio-pro­fes­sio­nal category.

The data­set5 was com­pi­led using refe­rence data publi­shed by INSEE, taking into account the dis­tri­bu­tion of socio-pro­fes­sio­nal cate­go­ries, the dis­tri­bu­tion of income and wealth, and the dis­tri­bu­tion of expen­di­ture accor­ding to these socio-pro­fes­sio­nal cate­go­ries. The divi­sion into cate­go­ries is based on a simple per­cen­tage of the actual situa­tion. For the other para­me­ters, we have used a Singh-Mad­da­la dis­tri­bu­tion6.

The fight against fraud can­not be based on simple detec­tion algo­rithms ; it must be inte­gra­ted into a col­lec­tive and human dimension.

We have deve­lo­ped dif­ferent types of algo­rithms to detect poten­tial fraud cases : either based on neu­ral net­works with dif­ferent sam­pling, or based on a ran­dom forest, i.e. a col­lec­tion of deci­sion trees used to solve a clas­si­fi­ca­tion problem.

Have these algorithms been used on real cases ?

Although the algo­rithms have not been imple­men­ted on real data, it is quite pos­sible to share these ele­ments with public agents, in par­ti­cu­lar those of the DGFiP’s SJCF-1D “Control pro­gram­ming and data ana­ly­sis” office, where one of the stu­dents sub­se­quent­ly com­ple­ted an inter­n­ship. Any col­la­bo­ra­tion or feed­back with a public enti­ty would be an oppor­tu­ni­ty to be seized.

What is the level of accuracy ?

It is impor­tant to remem­ber that there is a trade-off in detec­tion bet­ween accu­ra­cy (i.e. the rate of cor­rect pre­dic­tions among posi­tive res­ponses) and sen­si­ti­vi­ty (i.e. the rate of posi­tive indi­vi­duals detec­ted by the model). The results of an algo­rithm are the­re­fore expres­sed in terms of a metric that consi­ders the trade-off bet­ween pre­ci­sion and sen­si­ti­vi­ty (AUPRC : “area under the pre­ci­sion-recall curve”).

The pro­po­sed algo­rithms achieve an AUPRC of up to 0.851 for the sen­si­ti­vi­ty-opti­mi­sed ran­dom forest. This is an excellent result, which points to par­ti­cu­lar­ly use­ful pros­pects for detec­ting poten­tial fraud using arti­fi­cial intelligence.

Is AI enough on its own ?

No. The fight against fraud can­not be based on simple detec­tion algo­rithms ; it must be inte­gra­ted into a col­lec­tive and human approach. And that’s because the fight against fraud is not just a tech­no­lo­gi­cal issue. The detec­tion of poten­tial fraud must be cor­ro­bo­ra­ted by the action of a tax audi­tor, as part of a pro­ce­dure that res­pects the rights of the tax­payer. This approach gua­ran­tees that the situa­tion will be exa­mi­ned by people who will take account of tax case law, under the super­vi­sion of a judge.

It is the­re­fore impor­tant to unders­tand that the ana­ly­sis of a case is entrus­ted to audi­tors on the basis of cri­te­ria such as skills, work­load, pro­fes­sio­nal inter­est, cove­rage of the tax sys­tem and so on. We have deve­lo­ped algo­rithms that aim to sug­gest a dis­tri­bu­tion of cases to the head of a team of audi­tors, who then has the final say. He or she may also take sub­jec­tive cri­te­ria into account, such as the need to train new agents, even if the allo­ca­tion of files would then no lon­ger be ideal.

Final­ly, it is worth remem­be­ring that a fraud detec­tion appli­ca­tion must be inte­gra­ted into an infor­ma­tion sys­tem that ensures that all the admi­nis­tra­tion’s func­tions are car­ried out. The­re­fore, in addi­tion to research work, ope­ra­tio­nal imple­men­ta­tion requires plan­ning for both inter­con­nec­tions with other appli­ca­tions and the main­tai­na­bi­li­ty of the fraud detec­tion appli­ca­tion. Simi­lar­ly, the abi­li­ty to inte­grate new, more power­ful algo­rithms should also be detailed.

James Bowers

Legal dis­clai­mer : The contents of this article are the sole res­pon­si­bi­li­ty of the author and are not inten­ded for any pur­pose other than aca­de­mic infor­ma­tion and research.

Ack­now­led­ge­ments : The author would like to thank the Cen­tra­le­Su­pé­lec stu­dents who wor­ked on the pro­ject and all the co-authors with whom he car­ried out his research to contri­bute to aca­de­mic research into fraud.

1https://​www​.insee​.fr/​f​r​/​s​t​a​t​i​s​t​i​q​u​e​s​/​6​4​78533
2https://www.ccomptes.fr/system/files/2019–11/20191202-synthese-fraude-aux-prelevements-obligatoires.pdf
3Prol­hac, J., Gaie, C. “Pro­vi­ding an open fra­me­work to faci­li­tate tax fraud detec­tion”, Inter­na­tio­nal Jour­nal of Com­pu­ter Appli­ca­tions in Tech­no­lo­gy, In Publish, 2023, https://​doi​.org/​1​0​.​1​5​0​4​/​I​J​C​A​T​.​2​0​2​3​.​1​0​0​55494
4Gaie, C. (2023). Strug­gling Against Tax Fraud, a Holis­tic Approach Using Arti­fi­cial Intel­li­gence. In : Gaie, C., Meh­ta, M. (eds) Recent Advances in Data and Algo­rithms for e‑Government. Arti­fi­cial Intel­li­gence-Enhan­ced Soft­ware and Sys­tems Engi­nee­ring, vol 5. Sprin­ger, Cham. https://doi.org/10.1007/978–3‑031–22408-9_4
5https://​git​lab​.com/​j​e​a​n​.​p​r​o​l​h​a​c​/​d​e​t​e​c​t​i​o​n​-​d​e​-​f​r​aude/
6Singh, A., Nari­na, T. and Aakank­sha, S. (2016) “A review of super­vi­sed machine lear­ning algo­rithms”, Pro­cee­dings of the 3rd Inter­na­tio­nal Confe­rence on Com­pu­ting for Sus­tai­nable Glo­bal Deve­lop­ment (INDIA­Com), pp.1310–1315. https://​ieeex​plore​.ieee​.org/​a​b​s​t​r​a​c​t​/​d​o​c​u​m​e​n​t​/​7​7​24478

Support accurate information rooted in the scientific method.

Donate