1_son
π Science and technology
Science at the service of creativity

AI, a high-potential tool for music creation

with Gaël Richard, Professor at Télécom Paris (IP Paris) and Scientific Co-director of the Hi! PARIS interdisciplinary center for artificial intelligence
On September 3rd, 2024 |
5 min reading time
Gaël Richard
Gaël Richard
Professor at Télécom Paris (IP Paris) and Scientific Co-director of the Hi! PARIS interdisciplinary center for artificial intelligence
Key takeaways
  • AI applied to sound makes it possible to analyse, transform and synthesise sound signals.
  • The applications are numerous, ranging from predictive maintenance to virtual reality enhancement and personal assistance.
  • AI algorithms applied to sound require specific methods due to the temporal and voluminous nature of sound data.
  • The challenges associated with sound AI include its ecological impact, copyright issues, ethical concerns, and the need for an appropriate legal framework.
  • The HI-Audio project combines machine learning and human knowledge to create AI models that are more interpretable and controllable.

For more than 20 years, research­ers have been using arti­fi­cial intel­li­gence (AI) on sound sig­nals. These sound sig­nals can be speech, music, or envir­on­ment­al sounds. Recent advances in algorithms are open­ing the door to new fields of research and new applications.

How can artificial intelligence be used to process sound signals?

Firstly, AI can be used for sound ana­lys­is. In oth­er words, based on a record­ing, the machine can recog­nise the sounds (which instru­ment is play­ing, which machine or object is gen­er­at­ing which noise, etc.) and the record­ing con­di­tions (live, stu­dio, out­side, etc.). For example, Shazam is a fairly simple but very well-known music recog­ni­tion AI.

AI can also be used to trans­form sound. For example, this involves sep­ar­at­ing the dif­fer­ent sources of a sound record­ing so that they can be remixed dif­fer­ently (as with karaoke applic­a­tions). It is also pos­sible to con­sider trans­fer­ring the music­al style of a giv­en sound record­ing or chan­ging the acous­tic con­di­tions of the record­ing (for example by remov­ing the rever­ber­a­tion while keep­ing the con­tent intact). Finally, the third major area of sound pro­cessing using gen­er­at­ive AI is syn­thes­is. Giv­en a music­al extract or cer­tain instruc­tions, the machine can gen­er­ate music in the style of the extract. It can also be asked to gen­er­ate music in rela­tion to a text or image.

I’m cur­rently work­ing on a major research pro­ject fun­ded by the European Research Coun­cil (ERC) called HI-Audio, or “Hybrid and Inter­pretable Deep neur­al audio machines” The term “hybrid” implies that instead of learn­ing solely from large quant­it­ies of data, we are incor­por­at­ing a pri­ori inform­a­tion deduced from our know­ledge into our learn­ing mod­els. We already have cer­tain know­ledge about sound: the type of music­al instru­ments present, the level of rever­ber­a­tion in a room, etc. The idea is to use this know­ledge as a basis for rel­at­ively simple mod­els that describe these phe­nom­ena. Then we insert them into neur­al net­works and more com­plex mod­els that allow us to learn and describe what we don’t know. The res­ult is mod­els that com­bine inter­pretab­il­ity and controllability.

What are the specific features of AI algorithms applied to sound?

A sound sig­nal is a tem­por­al sig­nal (a sequence of data ordered in time) that can be more or less peri­od­ic. First of all, each sound sig­nal has its own spe­cif­ic char­ac­ter­ist­ics. Recog­nising the instru­ments and notes in a music­al record­ing requires advanced source sep­ar­a­tion tech­niques, mak­ing it pos­sible to dis­tin­guish and isol­ate each sound ele­ment. Unlike speech, where a single instru­ment (the voice) con­veys a lin­guist­ic mes­sage, music­al ana­lys­is must man­age the sim­ul­tan­eity and har­mony of the instruments.

Anoth­er spe­cificity of music is the length of the record­ings. In prin­ciple, this type of AI is trained in much the same way as for images or text. But unlike an image, a sound sig­nal is a series of num­bers, pos­it­ive or neg­at­ive, that vary over time around a ref­er­ence value. For one second of music, with a CD-qual­ity record­ing, there are 44,100 val­ues per second. Sim­il­arly, if we have had one minute of record­ing, we have 2,646,000 val­ues (44,100 x 60 seconds). Data volumes are very high for a short peri­od of time. It is there­fore neces­sary to have spe­cif­ic AI meth­ods applied to sound, but also very power­ful ana­lys­is resources to be able to pro­cess this volume of data.

Which application sectors could benefit from these developments in sound processing?

Sound sig­nal pro­cessing, or more gen­er­ally AI applied to sound, is already used in a vari­ety of fields. First of all, there are indus­tri­al applic­a­tions. Speech is very sens­it­ive to rever­ber­a­tion, which can quickly affect intel­li­gib­il­ity. It is neces­sary to “clean” the sound sig­nal of envir­on­ment­al noise, par­tic­u­larly for tele­phone com­mu­nic­a­tions. Anoth­er area not to be over­looked is the use­ful­ness of syn­thes­ised sound envir­on­ments in the audi­ovisu­al industry. Recre­at­ing ambi­ent sound allows you to sug­gest what is off-screen. Let’s ima­gine a film scene on a café ter­race. We prob­ably won’t know where the café is loc­ated: in the town centre, in a res­id­en­tial area, near a park, etc. Depend­ing on the dir­ec­tion, sound can help immerse the view­er in a rich­er atmo­sphere. The same applies to video games and vir­tu­al real­ity. Sound is one of the five senses, so we are very sens­it­ive to it. Adding sound enhance­ment increases real­ism and immer­sion in a vir­tu­al environment.

With the devel­op­ment of AI applied to sound, new fields of applic­a­tion can be envis­aged. I’m think­ing par­tic­u­larly of pre­dict­ive main­ten­ance, mean­ing that we could use sound to detect when an object is start­ing to mal­func­tion. Under­stand­ing the sound envir­on­ment could also be use­ful in the devel­op­ment of self-driv­ing cars. In addi­tion to the inform­a­tion cap­tured by the cam­er­as, it will be able to steer itself accord­ing to the sur­round­ing noise: bicycle bells, ped­es­tri­ans’ reactions.

Let’s not for­get that pro­cessing sound sig­nals can become a tool for help­ing people. In the future, we can ima­gine an AI trans­lat­ing the sound envir­on­ment into anoth­er mod­al­ity, enabling deaf people to “hear” the world around them. On the oth­er hand, per­haps sound ana­lys­is will help to pro­tect people at home by detect­ing and char­ac­ter­ising nor­mal, abnor­mal, and alarm­ing noises in the home. And that’s just a non-exhaust­ive list of pos­sible applications!

What are the main challenges and issues linked to the development and use of AI in general and more specifically in the field of sound?

One of the main dilem­mas is the eco­lo­gic­al impact of such sys­tems. The per­form­ance of gen­er­at­ive AI in gen­er­al is cor­rel­ated with the amount of data inges­ted and com­put­ing power. Although we have so-called “frugal” approaches, the envir­on­ment­al and eco­nom­ic reper­cus­sions of these tools are non-neg­li­gible. This is where my research pro­ject comes in, as it explores an altern­at­ive, more frugal approach to hybrid AI. 

Anoth­er con­cern for sound pro­cessing is access to music data­bases because of copy­right issues. Over­all, reg­u­la­tions can be an obstacle to the devel­op­ment of AI in France. In the United States, the notion of fair use allows a degree of flex­ib­il­ity in the use of copy­righted works. In Europe, we are jug­gling between sev­er­al meth­ods. All the same, there are a few pub­lic data­bases that con­tain roy­alty-free com­pos­i­tions writ­ten spe­cific­ally for research pur­poses. Some­times we work with com­pan­ies like Deez­er, which offer restric­ted access to their cata­logues for spe­cif­ic projects.

AI applied to sound also poses cer­tain spe­cif­ic eth­ic­al prob­lems. In par­tic­u­lar, there is the ques­tion of the music gen­er­ated by the machine and the poten­tial for pla­gi­ar­ism, since the machine may have been trained using well-known and pro­tec­ted music. Who owns the copy­right to the music gen­er­ated by the machine? What is the price of this auto­mat­ic­ally gen­er­ated music? How trans­par­ent is the music cre­ation pro­cess? Finally, there is the ques­tion of the con­trol­lab­il­ity of AI or, more pre­cisely, its explic­ab­il­ity. We need to be able to explain the decisions taken by the machine. Let’s go back to our example of the autonom­ous car: we need to be able to determ­ine why it chooses to turn at a giv­en moment. “It was the most likely action,” is not a suf­fi­cient answer, par­tic­u­larly in the event of an acci­dent. In my opin­ion, it is vital to integ­rate human know­ledge into these AI sys­tems and to ensure trans­par­ency in its use. 

More gen­er­ally, we need to build a leg­al frame­work for these con­stantly evolving tech­no­lo­gies. But France and Europe some­times tend to over­reg­u­late, ham­per­ing innov­a­tion and our inter­na­tion­al com­pet­it­ive­ness. We need to identi­fy and pro­tect ourselves against the risks of abuse and the eth­ic­al risks of AI, which are real, but we also need to avoid overregulation. 

Do you think AI will have an impact on musicians and the sound industry?

AI will have an impact every­where. In all pro­fes­sions, all com­pan­ies and all envir­on­ments, includ­ing jobs in the music sec­tor. Yes, it can raise con­cerns and ques­tions, like musi­cians and film sound engin­eers who fear they will be replaced. Some jobs may dis­ap­pear, but oth­ers will be created.

In my view, AI is more a tool than a threat. They will open up a new range of pos­sib­il­it­ies. By mak­ing it pos­sible to play togeth­er remotely, AI will be able to bring togeth­er com­munit­ies of musi­cians across the plan­et. They can also help to demo­crat­ise music learn­ing, by cre­at­ing fun, per­son­al­ised remote “train­ing courses”. It is also a fairly soph­ist­ic­ated com­pos­i­tion tool that can stim­u­late artists’ creativity.

AI in itself is not cre­at­ive. It repro­duces and reshapes but cre­ates noth­ing. Sim­il­arly, in my opin­ion, AI does not make art. It’s almost con­cep­tu­ally impossible for a machine to make art. Art, even if it’s not clearly defined, is per­son­i­fied; it’s a form of human com­mu­nic­a­tion. Today, AI, par­tic­u­larly AI applied to sound pro­cessing, is not cap­able of that.

Interview by Loraine Odot

Support accurate information rooted in the scientific method.

Donate