Home / Chroniques / How open-source AI could modernise public services
Powerful gust of wind symbolized by dynamic, swirling lines enveloping an open laptop that displays lines of open-source code on the screen
Généré par l'IA / Generated using AI
π Digital π Society π Science and technology

How open-source AI could modernise public services

Christophe Gaie
Christophe Gaie
Head of the Engineering and Digital Innovation Division at the Prime Minister's Office
Laurent Denis
Laurent Denis
Technical Architect in the Prime Minister's Office
Key takeaways
  • AI and LLMs represent a major opportunity to transform public action, in particular by improving the quality and efficiency of services.
  • Open AI appears to be an interesting option for modernising digital public services, with risks that remain to be assessed.
  • Open AI has many advantages, including complete transparency of the source code, reduced costs and independence of administrations from publishers.
  • Closed AI models also have advantages, such as less susceptibility to certain settings being tampered with and better control over how the AI operates.
  • It is essential to conduct an in-depth study of the ethical issues involved in using AI in the public sector, particularly to guard against certain biases.

Arti­fi­cial intel­li­gence (AI), and more spe­cific­ally large lan­guage mod­els (LLMs) rep­res­ent a major oppor­tun­ity to trans­form pub­lic ser­vices. AI can be used in many areas to improve effi­ciency, the qual­ity of ser­vices provided to cit­izens and decision-mak­ing. How­ever, the imple­ment­a­tion of AI in pub­lic ser­vices presents major chal­lenges. First, the chosen solu­tion must guar­an­tee fair treat­ment and trans­par­ency of decisions and actions on a case or ensure respect for fun­da­ment­al rights through­out its use. In addi­tion, the rig­or­ous pro­tec­tion of per­son­al data, which is often sens­it­ive in the con­text of pub­lic ser­vices, is a sig­ni­fic­ant secur­ity issue. Finally, the trans­par­ency of decisions is a major factor in the trust placed in the solu­tions used and their accept­ab­il­ity to cit­izens. Thus, the use of a solu­tion offer­ing a high level of trans­par­ency is an asset in the imple­ment­a­tion and accept­ance of arti­fi­cial intel­li­gence solu­tions. But in view of the com­plex­ity of the sub­ject, the cri­ter­ia for ensur­ing the expec­ted level of trans­par­ency are far from easy to define.

The definition of a free AI is still a subject of debate

The major lan­guage mod­els are based on neur­al net­works trained on a very large amount of data. From a sequence of words, they stat­ist­ic­ally determ­ine the word that best matches the giv­en sequence. By apply­ing this prin­ciple recurs­ively, LLMs are able to pro­duce struc­tured texts, giv­ing the impres­sion that the machine is ana­lys­ing and under­stand­ing the ques­tion asked.

The text pro­duced will there­fore depend on:

  • the algorithms used, which will enable the mod­el to weigh the import­ance of each word in a sen­tence in rela­tion to the oth­ers. This capa­city is provided in par­tic­u­lar through “trans­former” type archi­tec­tures1.
  • the weight assigned to the dif­fer­ent neur­ons, which will enable the net­work to be activ­ated in order to pro­duce the out­put data;
  • the learn­ing cor­pus, which has a dir­ect impact on the determ­in­a­tion of the weights used by the model.

The 4 prin­ciples (use, study, modi­fy, share) asso­ci­ated with free soft­ware2 must there­fore be applied to all these ele­ments3. This is still a sub­ject of debate and is there­fore a source of much con­fu­sion4. For example, some AIs claim­ing to be free have usage restric­tions that go against the defined prin­ciples5. After a long pro­cess, the Open Source Ini­ti­at­ive (OSI), which brings togeth­er research­ers, law­yers, policy makers, act­iv­ists and rep­res­ent­at­ives of large tech­no­logy com­pan­ies, has pro­posed a defin­i­tion that cor­rel­ates the four freedoms asso­ci­ated with free soft­ware and the ele­ments on which machine learn­ing mod­els are based.

Accord­ing to the Open Source Ini­ti­at­ive, a free machine learn­ing sys­tem must include the fol­low­ing ele­ments6:

  • suf­fi­ciently detailed inform­a­tion on the data used to train the sys­tem, enabling a com­pet­ent per­son to build a sub­stan­tially equi­val­ent sys­tem. This inform­a­tion must be avail­able under terms approved by the OSI;
  • the source code of the AI, includ­ing the infer­ence code to execute the model;
  • all the learned para­met­ers that are super­im­posed on the mod­el archi­tec­ture to pro­duce an out­put from a giv­en input.

The pub­lic­a­tion of the learn­ing cor­pus is there­fore not com­puls­ory, but a detailed descrip­tion of the lat­ter must be included. It is clear that many mod­els offer­ing excel­lent per­form­ance and describ­ing them­selves as open source do not com­ply with this last point. These are referred to as open-weight mod­els. A com­par­is­on of AI mod­els is also avail­able from the Pôle d’Ex­pert­ise de la Régu­la­tion Numérique (PER­eN).

What are the risks and benefits associated with the different types of licences?

The source code is human-read­able and provides access to the algorithms used. The weights are the res­ult of train­ing and rep­res­ent the know­ledge of the mod­el. In the case of open-weight mod­els, this know­ledge can be cus­tom­ised through a fine-tun­ing pro­cess7.

How­ever, this does not allow for total trans­par­ency, such as the detec­tion of bias or “pois­on­ing” attacks, which con­sist of alter­ing the know­ledge of a mod­el without these modi­fic­a­tions being eas­ily detect­able by stand­ard tests89. Only a free mod­el that provides access to its learn­ing cor­pus guar­an­tees a total level of trans­par­ency, in par­tic­u­lar by allow­ing com­plete con­trol of its train­ing. How­ever, this approach of recon­struct­ing from the sources still requires sig­ni­fic­ant com­put­ing resources that few entit­ies are able to acquire.

On 30th Octo­ber 2023, Pres­id­ent Biden issued an exec­ut­ive order entitled Safe, Secure, and Trust­worthy Devel­op­ment and Use of Arti­fi­cial Intel­li­gence to assess the risks and bene­fits of found­a­tion mod­els for which weights are avail­able. The report res­ult­ing from this study10 recog­nises the bene­fits of open access to mod­el weights, such as innov­a­tion and research, but also high­lights the poten­tial risks, includ­ing the pos­sib­il­ity of mali­cious use, the remov­al of safety mech­an­isms and the impact on com­pet­i­tion. The report con­cludes that the cur­rent data is insuf­fi­cient to defin­it­ively determ­ine wheth­er restric­tions on open-weight mod­els are jus­ti­fied and recom­mends act­ive mon­it­or­ing of these models.

Closed mod­els, even if they do not bene­fit from the same level of trans­par­ency and adapt­ab­il­ity as their free or open-weight coun­ter­parts, are not without advant­ages. They are less sub­ject to the risks of manip­u­la­tion men­tioned above because their weights can­not be mod­i­fied by a third party, the risks to the intel­lec­tu­al prop­erty of the train­ing data are borne by the mod­el sup­pli­er, the pub­lish­er can quickly act on its mod­el in order to react in the event of abuse, thus help­ing to mit­ig­ate the poten­tial risks asso­ci­ated with AI, such as the dis­sem­in­a­tion of inap­pro­pri­ate con­tent11. How­ever, all this is at the expense of the autonomy that can be had over the AI model.

Should priority be given to AI with an open licence?

The use of open AIs as defined by the OSI has many advant­ages. First of all, the trans­par­ency of their oper­a­tion is guar­an­teed, since it is pos­sible to dir­ectly access and modi­fy their source code and inspect the train­ing data.

This pos­sib­il­ity is a fun­da­ment­al guar­an­tee since each mod­el used can be sub­jec­ted to in-depth veri­fic­a­tion to ensure that the decision-mak­ing pro­cess com­plies with cur­rent legis­la­tion and does not present any dis­crim­in­at­ory bias, for example. On the oth­er hand, when AI is used in the con­text of retriev­al-aug­men­ted gen­er­a­tion (RAG12), the level of trans­par­ency that must be required may be lower because the data used to for­mu­late the responses are provided through an algorithm over which it is easi­er to have the expec­ted level of con­trol. As the body of answers is provided by con­ven­tion­al search algorithms, it is rel­at­ively easy to provide the end user with the expec­ted answer, the raw data and their level of con­fid­ence. How­ever, this requires a crit­ic­al eye on the part of the end user.

Even if the State’s mis­sions are rel­at­ively spe­cif­ic by their very nature, there are many cases of use that are sim­il­ar to those that can be found in private com­pan­ies, namely provid­ing an answer to a ques­tion by using a cor­pus of doc­u­ments with the help of clas­sic or vec­tor search algorithms based on the concept of sim­il­ar­ity13. It is there­fore not sur­pris­ing to see a con­ver­gence in the mod­els used in both worlds. For the State, the decid­ing cri­terion in the choice of mod­els will there­fore be related to the pre­ser­va­tion of per­son­al inform­a­tion or sens­it­ive inform­a­tion trans­mit­ted to AI models.

The use of open-source solu­tions makes it pos­sible to drastic­ally reduce expenses

Bey­ond the aspects men­tioned above, the use of open-source solu­tions also allows the State to dis­sem­in­ate its work so that it can be reused by the pub­lic or private sec­tor. For example, the DGFiP has pub­lished work on a mod­el for syn­thes­ising par­lia­ment­ary amend­ments1415. They are thus able to act­ively share their know­ledge with­in the lim­its of con­fid­en­ti­al­ity neces­sary for sov­er­eign missions.

Finally, the use of open-source solu­tions makes it pos­sible to drastic­ally reduce expenses, by lim­it­ing them to tech­nic­al sup­port without licence costs.

Are there any difficulties in implementing AI under a free licence?

The use of AI under a free licence also presents vari­ous chal­lenges. First of all, the imple­ment­a­tion of free solu­tions requires a good under­stand­ing of how the under­ly­ing mod­els work. In addi­tion to this com­plex­ity, there is also the need for tech­nic­al skills to adapt the mod­els to busi­ness needs, to have the data neces­sary for learn­ing, to con­fig­ure the mod­el (fine-tun­ing), if the busi­ness applic­a­tion requires it, to deploy it in the admin­is­tra­tion’s IS and to guar­an­tee the highest level of security.

In addi­tion, their ongo­ing main­ten­ance and cor­rect­ive main­ten­ance require a sig­ni­fic­ant invest­ment of time, both to update the mod­els or ensure a sat­is­fact­ory level of non-regres­sion and to ensure that they func­tion prop­erly. Although the code is free, the use of these AIs often also requires IT infra­struc­tures based on spe­cial­ised com­put­ing units, which can rep­res­ent an indir­ect cost. Finally, the qual­ity of open-source mod­els can vary con­sid­er­ably, par­tic­u­larly depend­ing on the busi­ness cases to be addressed, and there are no abso­lute guar­an­tees as to their per­form­ance. It is there­fore essen­tial to define the expect­a­tions pre­cisely with the busi­ness teams and to veri­fy the expec­ted res­ults before any ver­sion is put into service.

Conclusion

The integ­ra­tion of arti­fi­cial intel­li­gence with­in pub­lic ser­vices rep­res­ents a unique oppor­tun­ity to improve the effi­ciency and qual­ity of ser­vices provided to cit­izens and decision-mak­ing in a con­text of strain on avail­able human resources. Free lan­guage mod­els seem to be par­tic­u­larly well-suited to this challenge.

Des­pite the chal­lenges, the advant­ages of free AIs are numer­ous. They pro­mote innov­a­tion, reduce costs and strengthen the autonomy of administrations.

How­ever, it is essen­tial to study in depth the eth­ic­al issues related to the use of AI in the pub­lic sec­tor. It is neces­sary to put in place pro­cesses and meth­ods to guard against algorithmic bias and to guar­an­tee the reas­on­able use of tech­no­lo­gies, ensur­ing that they are mon­itored by digit­al and leg­al experts, or even by cit­izens themselves.

Dis­claim­er: The con­tent of this art­icle is the sole respons­ib­il­ity of its authors and has no scope oth­er than that of inform­a­tion and aca­dem­ic research.

1A. Vaswani et al., “Atten­tion Is All You Need”. 2023. [Online]. Avail­able at: https://​arx​iv​.org/​a​b​s​/​1​7​0​6​.​03762
2“Logi­ciel libre” Wiki­pe­dia. 14 Novem­ber 2024. [Online]. Avail­able at: https://​fr​.wiki​pe​dia​.org/​w​/​i​n​d​e​x​.​p​h​p​?​t​i​t​l​e​=​L​o​g​i​c​i​e​l​_​l​i​b​r​e​&​o​l​d​i​d​=​2​2​0​2​93632
3B. Doer­rfeld, “Be care­ful with “open source” AI”, Lead­Dev. [Online]. Avail­able at: https://​lead​dev​.com/​t​e​c​h​n​i​c​a​l​-​d​i​r​e​c​t​i​o​n​/​b​e​-​c​a​r​e​f​u​l​-​o​p​e​n​-​s​o​u​r​ce-ai
4W. Rhi­an­non, “We finally have a defin­i­tion for open-source AI”, MIT Tech­no­logy Review. [Online]. Avail­able at: https://​www​.tech​no​lo​gyre​view​.com/​2​0​2​4​/​0​8​/​2​2​/​1​0​9​7​2​2​4​/​w​e​-​f​i​n​a​l​l​y​-​h​a​v​e​-​a​-​d​e​f​i​n​i​t​i​o​n​-​f​o​r​-​o​p​e​n​-​s​o​u​r​c​e-ai/
5N. Lam­bert, “The koan of an open-source LLM”, Inter­con­nects. [Online]. Avail­able at: https://​www​.inter​con​nects​.ai/​p​/​a​n​-​o​p​e​n​-​s​o​u​r​c​e-llm
6“The Open Source AI Defin­i­tion – 1.0 – Open Source Ini­ti­at­ive”, Open source ini­ti­at­ive. [Online]. Avail­able at: https://​open​source​.org/​a​i​/​o​p​e​n​-​s​o​u​r​c​e​-​a​i​-​d​e​f​i​n​ition
7Stéphane Le Calme, “L’équilibre délicat entre sécur­ité et innov­a­tion dans l’IA : “ban­nir les mod­èles “open weights” serait un désastre”. [Online]. Avail­able at: https://​intel​li​gence​-arti​fi​ci​elle​.develop​pez​.com/​a​c​t​u​/​3​5​6​0​1​2​/​T​h​e​-​d​e​l​i​c​a​t​e​-​b​a​l​a​n​c​e​-​b​e​t​w​e​e​n​-​s​a​f​e​t​y​-​a​n​d​-​i​n​n​o​v​a​t​i​o​n​-​i​n​-​A​I​-​b​a​n​n​i​n​g​-​o​p​e​n​-​w​e​i​g​h​t​s​-​m​o​d​e​l​s​-​w​o​u​l​d​-​b​e​-​a​-​d​i​s​a​s​t​e​r​-​a​c​c​o​r​d​i​n​g​-​t​o​-​a​-​r​e​s​e​a​r​c​h​e​r​-​t​h​e​-​B​i​d​e​n​-​a​d​m​i​n​i​s​t​r​a​t​i​o​n​-​i​s​-​c​o​n​s​i​d​e​r​i​n​g​-​b​l​o​c​k​i​n​g​-​a​c​c​e​s​s​-​t​o​-​t​h​e​s​e​-​m​o​d​e​l​s​-​t​o​-​p​r​e​v​e​n​t​-​l​e​s​-​abus/
8“Pois­onG­PT: des LLM détournés à la racine – Data & IA – Sil​ic​on​.fr”. [Online]. Avail­able at: https://​www​.sil​ic​on​.fr/​T​h​e​m​a​t​i​q​u​e​/​d​a​t​a​-​i​a​-​1​3​7​2​/​B​r​e​v​e​s​/​P​o​i​s​o​n​G​P​T​-​d​e​s​-​L​L​M​-​d​e​t​o​u​r​n​e​s​-​a​-​l​a​-​r​a​c​i​n​e​-​4​0​2​7​8​3.htm
9“LLM03: Train­ing Data Pois­on­ing – OWASP Top 10 for LLM & Gen­er­at­ive AI Secur­ity”, OWASP. [Online]. Avail­able at: https://​genai​.owasp​.org/​l​l​m​r​i​s​k​/​l​l​m​0​3​-​t​r​a​i​n​i​n​g​-​d​a​t​a​-​p​o​i​s​o​ning/
10NTIA Report, “Dual-Use Found­a­tion Mod­els with Widely Avail­able Mod­el Weights”, July 2024. [Online]. Avail­able at: https://​www​.ntia​.gov/​s​i​t​e​s​/​d​e​f​a​u​l​t​/​f​i​l​e​s​/​p​u​b​l​i​c​a​t​i​o​n​s​/​n​t​i​a​-​a​i​-​o​p​e​n​-​m​o​d​e​l​-​r​e​p​o​r​t.pdf
11I. Sol­ai­man, ‘Gen­er­at­ive AI Sys­tems Aren’t Just Open or Closed Source,’ Wired. [Online]. Avail­able at: https://​www​.wired​.com/​s​t​o​r​y​/​g​e​n​e​r​a​t​i​v​e​-​a​i​-​s​y​s​t​e​m​s​-​a​r​e​n​t​-​j​u​s​t​-​o​p​e​n​-​o​r​-​c​l​o​s​e​d​-​s​o​urce/
12“What is Retriev­al-Aug­men­ted Gen­er­a­tion (RAG)? | The Com­plete Guide”. [Online]. Avail­able at: https://​www​.k2view​.com/​w​h​a​t​-​i​s​-​r​e​t​r​i​e​v​a​l​-​a​u​g​m​e​n​t​e​d​-​g​e​n​e​r​ation
13M. Syed et E. Russi, “Qu’est-ce que la recher­che vec­tor­i­elle?” [Online]. Avail­able at: https://​www​.ibm​.com/​f​r​-​f​r​/​t​o​p​i​c​s​/​v​e​c​t​o​r​-​s​earch
14J. Ges­nouin et al., “LLaMan­dement: Large Lan­guage Mod­els for Sum­mar­iz­a­tion of French Legis­lat­ive Pro­pos­als”. 2024. [Online]. Avail­able at: https://​arx​iv​.org/​a​b​s​/​2​4​0​1​.​16182
15“LLaMan­dement, le LLM open source du gouverne­ment français” [Online]. Avail­able at: https://​www​.actuia​.com/​a​c​t​u​a​l​i​t​e​/​l​l​a​m​a​n​d​e​m​e​n​t​-​l​e​-​l​l​m​-​o​p​e​n​-​s​o​u​r​c​e​-​d​u​-​g​o​u​v​e​r​n​e​m​e​n​t​-​f​r​a​n​cais/

Support accurate information rooted in the scientific method.

Donate