Home / Chroniques / AI, a weapon against tax fraud
AdobeStock_601667456
π Economics π Science and technology

AI, a weapon against tax fraud

Christophe Gaie
Christophe Gaie
Head of the Engineering and Digital Innovation Division at the Prime Minister's Office
Key takeaways
  • Tax fraud is a major issue, accounting for between 4% and 15% of the tax gap in various OECD countries.
  • In France, there is a desire to step up the fight against fraud, in particular by using artificial intelligence tools.
  • The CISIRH has developed an operational and theoretical framework for comparing different fraud detection algorithms around the world.
  • To combat tax fraud effectively, AI and algorithms will not be enough; this fight must be part of a collective and human approach.

Detect­ing tax fraud is a major chal­lenge, par­tic­u­larly in the cur­rent con­text of high gov­ern­ment defi­cits. Fraud accounts for a sig­ni­fic­ant pro­por­tion of the tax gap, estim­ated at between 4% and 15% of the sums owed in vari­ous OECD coun­tries. In France, for example, VAT fraud alone is estim­ated at €20–25bn1. As a res­ult, the Cour des Comptes has pub­lished numer­ous stud­ies high­light­ing the import­ance of step­ping up the fight against fraud2. In France, tax fraud is mon­itored by the DGFiP, which uses sev­er­al arti­fi­cial intel­li­gence tools that have pro­duced very prom­ising results.

With this in mind, Chris­tophe Gaie set up a pro­ject group with stu­dents from Cent­rale­Supélec. Togeth­er, they car­ried out a research study aimed at put­ting in place an oper­a­tion­al frame­work (meth­od­o­logy, algorithmic approach, com­puter code, sim­u­la­tion data, etc.) and shar­ing it with all those involved in the fight against fraud3.

What was the aim of this study?

This pro­ject is a con­tinu­ation of the more the­or­et­ic­al research that has helped to define and artic­u­late the vari­ous con­cepts, issues and dir­ec­tions in the field4. It extends and imple­ments this the­or­et­ic­al aspect and pro­poses an oper­a­tion­al frame­work that enables algorithms developed by research­ers from all over the world to be developed and compared.

As optim­isa­tion is not a pro­hib­ited action, our work has focused on fraud in the sense of irreg­u­lar­it­ies. We have also con­cen­trated our efforts on detect­ing fraud per­pet­rated by indi­vidu­als, as fraud by leg­al entit­ies can be dealt with elsewhere.

Where does your database for this study come from?

A tax file can con­tain a lot of data relat­ing to the indi­vidu­al: fam­ily situ­ation, income, assets, etc. Wheth­er in the labor­at­ory or when study­ing real data, it is not always pos­sible to have all the data avail­able. We there­fore cre­ated a fic­ti­tious data­base based on a set of pre-selec­ted data: socio-pro­fes­sion­al cat­egory, income, expendit­ure, amount of prop­erty. This data­base can of course be added to at a later date.

For per­fectly val­id reas­ons of con­fid­en­ti­al­ity of per­son­al data, the DGFiP can­not make data avail­able for the detec­tion of tax fraud. As a res­ult, each research­er builds up his or her own data­base inde­pend­ently, which has proved det­ri­ment­al for sev­er­al reas­ons. For example, each research­er has to build his or her own data­base, which is time-con­sum­ing and requires the research­er to appro­pri­ate con­cepts such as income, assets, etc., in order to detect fraud. But also, research­ers’ algorithms are not neces­sar­ily com­par­able with each oth­er, ref­er­ence data­bases being a clas­sic approach in the field of digit­al research (data­base of ref­er­ences, tele­com­mu­nic­a­tion sig­nals or images…).

How does this AI detect fraud?

The AI is based on a tax file mod­el that allows the selec­tion of files to be checked accord­ing to con­fig­ur­able cri­ter­ia. Based on our know­ledge of the major­ity of fraud cases, we have defined a taxpayer’s like­li­hood of fraud accord­ing to dif­fer­ent typologies:

  • High expenses and/or high assets in rela­tion to income,
  • Low expenses and/or assets com­pared to income,
  • High wealth com­pared to sim­il­ar people in the same socio-pro­fes­sion­al category.

The data­set5 was com­piled using ref­er­ence data pub­lished by INSEE, tak­ing into account the dis­tri­bu­tion of socio-pro­fes­sion­al cat­egor­ies, the dis­tri­bu­tion of income and wealth, and the dis­tri­bu­tion of expendit­ure accord­ing to these socio-pro­fes­sion­al cat­egor­ies. The divi­sion into cat­egor­ies is based on a simple per­cent­age of the actu­al situ­ation. For the oth­er para­met­ers, we have used a Singh-Mad­dala dis­tri­bu­tion6.

The fight against fraud can­not be based on simple detec­tion algorithms; it must be integ­rated into a col­lect­ive and human dimension.

We have developed dif­fer­ent types of algorithms to detect poten­tial fraud cases: either based on neur­al net­works with dif­fer­ent sampling, or based on a ran­dom forest, i.e. a col­lec­tion of decision trees used to solve a clas­si­fic­a­tion problem.

Have these algorithms been used on real cases?

Although the algorithms have not been imple­men­ted on real data, it is quite pos­sible to share these ele­ments with pub­lic agents, in par­tic­u­lar those of the DGFiP’s SJCF-1D “Con­trol pro­gram­ming and data ana­lys­is” office, where one of the stu­dents sub­sequently com­pleted an intern­ship. Any col­lab­or­a­tion or feed­back with a pub­lic entity would be an oppor­tun­ity to be seized.

What is the level of accuracy?

It is import­ant to remem­ber that there is a trade-off in detec­tion between accur­acy (i.e. the rate of cor­rect pre­dic­tions among pos­it­ive responses) and sens­it­iv­ity (i.e. the rate of pos­it­ive indi­vidu­als detec­ted by the mod­el). The res­ults of an algorithm are there­fore expressed in terms of a met­ric that con­siders the trade-off between pre­ci­sion and sens­it­iv­ity (AUPRC: “area under the pre­ci­sion-recall curve”).

The pro­posed algorithms achieve an AUPRC of up to 0.851 for the sens­it­iv­ity-optim­ised ran­dom forest. This is an excel­lent res­ult, which points to par­tic­u­larly use­ful pro­spects for detect­ing poten­tial fraud using arti­fi­cial intelligence.

Is AI enough on its own?

No. The fight against fraud can­not be based on simple detec­tion algorithms; it must be integ­rated into a col­lect­ive and human approach. And that’s because the fight against fraud is not just a tech­no­lo­gic­al issue. The detec­tion of poten­tial fraud must be cor­rob­or­ated by the action of a tax aud­it­or, as part of a pro­ced­ure that respects the rights of the tax­pay­er. This approach guar­an­tees that the situ­ation will be examined by people who will take account of tax case law, under the super­vi­sion of a judge.

It is there­fore import­ant to under­stand that the ana­lys­is of a case is entrus­ted to aud­it­ors on the basis of cri­ter­ia such as skills, work­load, pro­fes­sion­al interest, cov­er­age of the tax sys­tem and so on. We have developed algorithms that aim to sug­gest a dis­tri­bu­tion of cases to the head of a team of aud­it­ors, who then has the final say. He or she may also take sub­ject­ive cri­ter­ia into account, such as the need to train new agents, even if the alloc­a­tion of files would then no longer be ideal.

Finally, it is worth remem­ber­ing that a fraud detec­tion applic­a­tion must be integ­rated into an inform­a­tion sys­tem that ensures that all the admin­is­tra­tion’s func­tions are car­ried out. There­fore, in addi­tion to research work, oper­a­tion­al imple­ment­a­tion requires plan­ning for both inter­con­nec­tions with oth­er applic­a­tions and the main­tain­ab­il­ity of the fraud detec­tion applic­a­tion. Sim­il­arly, the abil­ity to integ­rate new, more power­ful algorithms should also be detailed.

James Bowers

Leg­al dis­claim­er: The con­tents of this art­icle are the sole respons­ib­il­ity of the author and are not inten­ded for any pur­pose oth­er than aca­dem­ic inform­a­tion and research.

Acknow­ledge­ments: The author would like to thank the Cent­rale­Supélec stu­dents who worked on the pro­ject and all the co-authors with whom he car­ried out his research to con­trib­ute to aca­dem­ic research into fraud.

1https://​www​.insee​.fr/​f​r​/​s​t​a​t​i​s​t​i​q​u​e​s​/​6​4​78533
2https://www.ccomptes.fr/system/files/2019–11/20191202-synthese-fraude-aux-prelevements-obligatoires.pdf
3Pro­l­hac, J., Gaie, C. “Provid­ing an open frame­work to facil­it­ate tax fraud detec­tion”, Inter­na­tion­al Journ­al of Com­puter Applic­a­tions in Tech­no­logy, In Pub­lish, 2023, https://​doi​.org/​1​0​.​1​5​0​4​/​I​J​C​A​T​.​2​0​2​3​.​1​0​0​55494
4Gaie, C. (2023). Strug­gling Against Tax Fraud, a Hol­ist­ic Approach Using Arti­fi­cial Intel­li­gence. In: Gaie, C., Mehta, M. (eds) Recent Advances in Data and Algorithms for e‑Government. Arti­fi­cial Intel­li­gence-Enhanced Soft­ware and Sys­tems Engin­eer­ing, vol 5. Spring­er, Cham. https://doi.org/10.1007/978–3‑031–22408-9_4
5https://​git​lab​.com/​j​e​a​n​.​p​r​o​l​h​a​c​/​d​e​t​e​c​t​i​o​n​-​d​e​-​f​r​aude/
6Singh, A., Nar­ina, T. and Aakank­sha, S. (2016) “A review of super­vised machine learn­ing algorithms”, Pro­ceed­ings of the 3rd Inter­na­tion­al Con­fer­ence on Com­put­ing for Sus­tain­able Glob­al Devel­op­ment (INDI­AC­om), pp.1310–1315. https://​iee​ex​plore​.ieee​.org/​a​b​s​t​r​a​c​t​/​d​o​c​u​m​e​n​t​/​7​7​24478

Support accurate information rooted in the scientific method.

Donate