1_son
π Science and technology
Science at the service of creativity

AI, a high-potential tool for music creation

with Gaël Richard, Professor at Télécom Paris (IP Paris) and Scientific Co-director of the Hi! PARIS interdisciplinary center for artificial intelligence
On September 3rd, 2024 |
5 min reading time
Gaël Richard
Gaël Richard
Professor at Télécom Paris (IP Paris) and Scientific Co-director of the Hi! PARIS interdisciplinary center for artificial intelligence
Key takeaways
  • AI applied to sound makes it possible to analyse, transform and synthesise sound signals.
  • The applications are numerous, ranging from predictive maintenance to virtual reality enhancement and personal assistance.
  • AI algorithms applied to sound require specific methods due to the temporal and voluminous nature of sound data.
  • The challenges associated with sound AI include its ecological impact, copyright issues, ethical concerns, and the need for an appropriate legal framework.
  • The HI-Audio project combines machine learning and human knowledge to create AI models that are more interpretable and controllable.

For more than 20 years, resear­chers have been using arti­fi­cial intel­li­gence (AI) on sound signals. These sound signals can be speech, music, or envi­ron­men­tal sounds. Recent advances in algo­rithms are ope­ning the door to new fields of research and new applications.

How can artificial intelligence be used to process sound signals ?

First­ly, AI can be used for sound ana­ly­sis. In other words, based on a recor­ding, the machine can reco­gnise the sounds (which ins­tru­ment is playing, which machine or object is gene­ra­ting which noise, etc.) and the recor­ding condi­tions (live, stu­dio, out­side, etc.). For example, Sha­zam is a fair­ly simple but very well-known music recog­ni­tion AI.

AI can also be used to trans­form sound. For example, this involves sepa­ra­ting the dif­ferent sources of a sound recor­ding so that they can be remixed dif­fe­rent­ly (as with karaoke appli­ca­tions). It is also pos­sible to consi­der trans­fer­ring the musi­cal style of a given sound recor­ding or chan­ging the acous­tic condi­tions of the recor­ding (for example by remo­ving the rever­be­ra­tion while kee­ping the content intact). Final­ly, the third major area of sound pro­ces­sing using gene­ra­tive AI is syn­the­sis. Given a musi­cal extract or cer­tain ins­truc­tions, the machine can gene­rate music in the style of the extract. It can also be asked to gene­rate music in rela­tion to a text or image.

I’m cur­rent­ly wor­king on a major research pro­ject fun­ded by the Euro­pean Research Coun­cil (ERC) cal­led HI-Audio, or “Hybrid and Inter­pre­table Deep neu­ral audio machines” The term “hybrid” implies that ins­tead of lear­ning sole­ly from large quan­ti­ties of data, we are incor­po­ra­ting a prio­ri infor­ma­tion dedu­ced from our know­ledge into our lear­ning models. We alrea­dy have cer­tain know­ledge about sound : the type of musi­cal ins­tru­ments present, the level of rever­be­ra­tion in a room, etc. The idea is to use this know­ledge as a basis for rela­ti­ve­ly simple models that des­cribe these phe­no­me­na. Then we insert them into neu­ral net­works and more com­plex models that allow us to learn and des­cribe what we don’t know. The result is models that com­bine inter­pre­ta­bi­li­ty and controllability.

What are the specific features of AI algorithms applied to sound ?

A sound signal is a tem­po­ral signal (a sequence of data orde­red in time) that can be more or less per­io­dic. First of all, each sound signal has its own spe­ci­fic cha­rac­te­ris­tics. Reco­gni­sing the ins­tru­ments and notes in a musi­cal recor­ding requires advan­ced source sepa­ra­tion tech­niques, making it pos­sible to dis­tin­guish and iso­late each sound ele­ment. Unlike speech, where a single ins­tru­ment (the voice) conveys a lin­guis­tic mes­sage, musi­cal ana­ly­sis must manage the simul­ta­nei­ty and har­mo­ny of the instruments.

Ano­ther spe­ci­fi­ci­ty of music is the length of the recor­dings. In prin­ciple, this type of AI is trai­ned in much the same way as for images or text. But unlike an image, a sound signal is a series of num­bers, posi­tive or nega­tive, that vary over time around a refe­rence value. For one second of music, with a CD-qua­li­ty recor­ding, there are 44,100 values per second. Simi­lar­ly, if we have had one minute of recor­ding, we have 2,646,000 values (44,100 x 60 seconds). Data volumes are very high for a short per­iod of time. It is the­re­fore neces­sa­ry to have spe­ci­fic AI methods applied to sound, but also very power­ful ana­ly­sis resources to be able to pro­cess this volume of data.

Which application sectors could benefit from these developments in sound processing ?

Sound signal pro­ces­sing, or more gene­ral­ly AI applied to sound, is alrea­dy used in a varie­ty of fields. First of all, there are indus­trial appli­ca­tions. Speech is very sen­si­tive to rever­be­ra­tion, which can qui­ck­ly affect intel­li­gi­bi­li­ty. It is neces­sa­ry to “clean” the sound signal of envi­ron­men­tal noise, par­ti­cu­lar­ly for tele­phone com­mu­ni­ca­tions. Ano­ther area not to be over­loo­ked is the use­ful­ness of syn­the­si­sed sound envi­ron­ments in the audio­vi­sual indus­try. Recrea­ting ambient sound allows you to sug­gest what is off-screen. Let’s ima­gine a film scene on a café ter­race. We pro­ba­bly won’t know where the café is loca­ted : in the town centre, in a resi­den­tial area, near a park, etc. Depen­ding on the direc­tion, sound can help immerse the vie­wer in a richer atmos­phere. The same applies to video games and vir­tual rea­li­ty. Sound is one of the five senses, so we are very sen­si­tive to it. Adding sound enhan­ce­ment increases rea­lism and immer­sion in a vir­tual environment.

With the deve­lop­ment of AI applied to sound, new fields of appli­ca­tion can be envi­sa­ged. I’m thin­king par­ti­cu­lar­ly of pre­dic­tive main­te­nance, mea­ning that we could use sound to detect when an object is star­ting to mal­func­tion. Unders­tan­ding the sound envi­ron­ment could also be use­ful in the deve­lop­ment of self-dri­ving cars. In addi­tion to the infor­ma­tion cap­tu­red by the came­ras, it will be able to steer itself accor­ding to the sur­roun­ding noise : bicycle bells, pedes­trians’ reactions.

Let’s not for­get that pro­ces­sing sound signals can become a tool for hel­ping people. In the future, we can ima­gine an AI trans­la­ting the sound envi­ron­ment into ano­ther moda­li­ty, enabling deaf people to “hear” the world around them. On the other hand, per­haps sound ana­ly­sis will help to pro­tect people at home by detec­ting and cha­rac­te­ri­sing nor­mal, abnor­mal, and alar­ming noises in the home. And that’s just a non-exhaus­tive list of pos­sible applications !

What are the main challenges and issues linked to the development and use of AI in general and more specifically in the field of sound ?

One of the main dilem­mas is the eco­lo­gi­cal impact of such sys­tems. The per­for­mance of gene­ra­tive AI in gene­ral is cor­re­la­ted with the amount of data inges­ted and com­pu­ting power. Although we have so-cal­led “fru­gal” approaches, the envi­ron­men­tal and eco­no­mic reper­cus­sions of these tools are non-negli­gible. This is where my research pro­ject comes in, as it explores an alter­na­tive, more fru­gal approach to hybrid AI. 

Ano­ther concern for sound pro­ces­sing is access to music data­bases because of copy­right issues. Ove­rall, regu­la­tions can be an obs­tacle to the deve­lop­ment of AI in France. In the Uni­ted States, the notion of fair use allows a degree of flexi­bi­li­ty in the use of copy­righ­ted works. In Europe, we are jug­gling bet­ween seve­ral methods. All the same, there are a few public data­bases that contain royal­ty-free com­po­si­tions writ­ten spe­ci­fi­cal­ly for research pur­poses. Some­times we work with com­pa­nies like Dee­zer, which offer res­tric­ted access to their cata­logues for spe­ci­fic projects.

AI applied to sound also poses cer­tain spe­ci­fic ethi­cal pro­blems. In par­ti­cu­lar, there is the ques­tion of the music gene­ra­ted by the machine and the poten­tial for pla­gia­rism, since the machine may have been trai­ned using well-known and pro­tec­ted music. Who owns the copy­right to the music gene­ra­ted by the machine ? What is the price of this auto­ma­ti­cal­ly gene­ra­ted music ? How trans­pa­rent is the music crea­tion pro­cess ? Final­ly, there is the ques­tion of the control­la­bi­li­ty of AI or, more pre­ci­se­ly, its expli­ca­bi­li­ty. We need to be able to explain the deci­sions taken by the machine. Let’s go back to our example of the auto­no­mous car : we need to be able to deter­mine why it chooses to turn at a given moment. “It was the most like­ly action,” is not a suf­fi­cient ans­wer, par­ti­cu­lar­ly in the event of an acci­dent. In my opi­nion, it is vital to inte­grate human know­ledge into these AI sys­tems and to ensure trans­pa­ren­cy in its use. 

More gene­ral­ly, we need to build a legal fra­me­work for these constant­ly evol­ving tech­no­lo­gies. But France and Europe some­times tend to over­re­gu­late, ham­pe­ring inno­va­tion and our inter­na­tio­nal com­pe­ti­ti­ve­ness. We need to iden­ti­fy and pro­tect our­selves against the risks of abuse and the ethi­cal risks of AI, which are real, but we also need to avoid overregulation. 

Do you think AI will have an impact on musicians and the sound industry ?

AI will have an impact eve­ryw­here. In all pro­fes­sions, all com­pa­nies and all envi­ron­ments, inclu­ding jobs in the music sec­tor. Yes, it can raise concerns and ques­tions, like musi­cians and film sound engi­neers who fear they will be repla­ced. Some jobs may disap­pear, but others will be created.

In my view, AI is more a tool than a threat. They will open up a new range of pos­si­bi­li­ties. By making it pos­sible to play toge­ther remo­te­ly, AI will be able to bring toge­ther com­mu­ni­ties of musi­cians across the pla­net. They can also help to demo­cra­tise music lear­ning, by crea­ting fun, per­so­na­li­sed remote “trai­ning courses”. It is also a fair­ly sophis­ti­ca­ted com­po­si­tion tool that can sti­mu­late artists’ creativity.

AI in itself is not crea­tive. It repro­duces and reshapes but creates nothing. Simi­lar­ly, in my opi­nion, AI does not make art. It’s almost concep­tual­ly impos­sible for a machine to make art. Art, even if it’s not clear­ly defi­ned, is per­so­ni­fied ; it’s a form of human com­mu­ni­ca­tion. Today, AI, par­ti­cu­lar­ly AI applied to sound pro­ces­sing, is not capable of that.

Interview by Loraine Odot

Support accurate information rooted in the scientific method.

Donate