Home / Chroniques / Generative AI: what are the next steps?
Eye of futuristic and Innovative Imagery AI and Automation use of artificial intelligence and automation in business processes, illustrating efficiency and productivity enhancements
π Science and technology π Digital

Generative AI : what are the next steps ?

Andrew Rogoyski
Andrew Rogoyski
Innovation Director for the Surrey Institute for People-Centred AI
Key takeaways
  • AI is moving forward a breakneck speed, and its pace of development is unlikely to slow.
  • Some developments, like multimodal AI, AI agents, and AI-optimised chips are just around the corner.
  • However, the development of AI is not yet profitable and is dominated by a few large commercial organisations.
  • Bigger leaps like AI-powered robots and AI mentors are further away, but likely to happen.
  • With these developments, regulatory bodies need to keep up.

AI has been a long time coming. But over the past couple of years, while it’s been in the public eye, it has seemed to advance at warp speed. Andrew Rogoyski shares his insights into what to expect next. What powerful new features can we expect are just over the hill for AI ?

We should explain that when we use the term “AI”, we’re cur­rent­ly most­ly focu­sing this dis­cus­sion on “Gene­ra­tive AI” or “GenAI” which plat­forms like OpenAI’s Chat-GPT have brought to the world in the last two years. Fur­ther big advan­ce­ments, being pushed for­ward by actors all around the world, are like­ly to come out soon. These alrea­dy have a roadmap.

One of these is AI beco­ming increa­sin­gly mul­ti­mo­dal. That means that large lan­guage models (LLMs) will learn and unders­tand text, video, and sound and how they relate to each other. Some models are alrea­dy brea­ching that bar­rier and rea­ching the mar­kets. Single mode AIs like Copi­lot can gene­rate images from text and vice ver­sa. Sora can gene­rate video from text. Run­way and Pika Labs are also offe­ring image-to-video gene­ra­tion. The newer Large Mul­ti­mo­dal Models (LMMs) from Ope­nAI, Meta, Google and others can gene­rate video from an image, text, and other data modes. For example, some GenAI models will ans­wer text ques­tions about the content of videos. Many indus­tries are being affec­ted with stu­dios in Hol­ly­wood rapid­ly asses­sing what this could mean for the movie indus­try. One of the down­sides of this power­ful tech­no­lo­gy is that you can create fair­ly intri­cate deep­fakes with smal­ler budgets.

Ano­ther big, expec­ted advance will be AI beco­ming an invi­sible tool. Ins­tead of having to log on to a dedi­ca­ted plat­form on a com­pu­ter or phone, we’ll be able to converse with our cars, phones, and appliances, and get very natu­ral ans­wers. There are seve­ral com­pa­nies wor­king on this, Apple with Apple intel­li­gence, Google with Google AI, Ama­zon with Alexa, and others.

The next step then is having AI act as a sort of agent on your behalf, allo­wing it to book in trips, hotel stays and so on. At this point, GenAI isn’t very good at plan­ning. That’s what Ope­nAI and others are wor­king on, get­ting GenAI that can break down a pro­blem into steps and take action on those steps. The ques­tion is, then how much autho­ri­ty do you give an agent to act on your behalf ? It seems like­ly that such agents will be inter­ac­ting with other agents, lea­ding to entire AI dis­cus­sions and nego­tia­tions taking place without human intervention.

Ano­ther fair­ly big deve­lop­ment will be the impro­ve­ment of AI retrie­val. That may sound boring, but it’s real­ly exci­ting in terms of pro­duc­ti­vi­ty. Cor­po­ra­tions col­lect thou­sands of docu­ments contai­ning cus­to­mer inter­ac­tions, bids, poli­cies, pro­ce­dures, and other use­ful infor­ma­tion. Howe­ver, retrie­val of such infor­ma­tion is gene­ral­ly poor. GenAI may be the solu­tion to the cor­po­rate “know­ledge mana­ge­ment” pro­blem. Wouldn’t it be won­der­ful to be able to ask your lap­top : “What was that big bid we did three years ago where we part­ne­red with that bank?” and have it infer the right ans­wers and give you a sum­ma­ry rather than a string of docu­ments you have to read through ?

Of course, before we can do this, we need to tackle AI hal­lu­ci­na­tion, which is the false infor­ma­tion gene­ra­ted by AI.. We have deve­lo­ped a tech­no­lo­gy that will hal­lu­ci­nate images, sounds, poe­try, and so on. But we are less keen on it hal­lu­ci­na­ting the com­pa­ny accounts or a medi­cal record. The trick now will be to take that real­ly nice conver­sa­tio­nal inter­face and link it to hard facts. Gene­ra­tive AI can create non­sense, which can be a big pro­blem. Recent­ly, Air Cana­da faced a small claims court case1 from a pas­sen­ger who tried to retroac­ti­ve­ly apply for a refund on his ticket after che­cking the company’s berea­ve­ment poli­cy on their AI-powe­red chat­bot. The AI hal­lu­ci­na­ted that pas­sen­gers could claim back money within 90 days of tra­vel, which isn’t in the company’s poli­cy. The court sided with the passenger.

Part of the move forward with AI will be limiting its cost, right ?

Yes, cost of run­ning these models today, in terms of ener­gy, cooling, and com­pu­ting power, makes them unsus­tai­nable, both com­mer­cial­ly and in the context of the cli­mate cri­sis. Com­pa­nies are like­ly to move from the exis­ting gra­phics pro­ces­sing units (GPUs) to hard­ware desi­gned around AI applications.

Apple have a “neu­ral pro­ces­sing unit”, Google have a “Ten­sor pro­ces­sing unit”, Micro­soft, IBM, Ama­zon, Sam­sung and others are all deve­lo­ping spe­cia­li­sed hard­ware that can deli­ver per­for­mance impro­ve­ments hun­dreds or thou­sands of times more effi­cient than GPUs and CPUs. These chips are mas­si­ve­ly paral­lel and opti­mi­sed for the matrix ope­ra­tions at the heart of machine lear­ning algorithms.

New chip archi­tec­tures are also being pro­po­sed to run these models with very low ener­gy. That’s the case for IBM’s North Pole AI chip2, for ins­tance, which pro­mises to reduce the power for typi­cal appli­ca­tions by a fac­tor of 253. Google is also wor­king on its Ten­sor Pro­ces­sing Unit to acce­le­rate AI pro­ces­sing and Groq’s Lan­guage Pro­ces­sing Unit is also sho­wing promise.

Then there are more eso­te­ric archi­tec­tures, such as the neu­ro­mor­phic chips. These are desi­gned to sup­port so-cal­led spi­king neu­ral net­works — com­pu­ting models that mimic the way human brains work. Those are most­ly in the aca­de­mic domain at the moment, but they are star­ting to move into other areas.

What about the fact that AI is that it is so heavily dominated by a few commercial entities at the moment ?

Cur­rent­ly, there is a big debate about ope­ning up LLMs to open source. Due to the scale of ope­ra­tions nee­ded to deve­lop LLMs and LMMs, com­mer­cial orga­ni­sa­tions have been very much at the fore­front of deve­lop­ment. Around 80–90% of them are deve­lo­ped by com­mer­cial orga­ni­sa­tions. That means that the tech­no­lo­gy has remai­ned most­ly in the hands of its proprietors—with some notable excep­tions like Meta’s LLa­MA and Mistral’s Large and Codes­tral, which were made open source ear­ly on. There are also open-source com­mu­ni­ty LLM/LMMs like Pla­ty­pus, Bloom, and Falcon.

On the one hand, more people expe­ri­men­ting and playing with the tech­no­lo­gy could trig­ger new advances, expose vul­ne­ra­bi­li­ties, and so on. On the other hand, there are people who will misuse that tech­no­lo­gy. There are cur­rent­ly fail-safes built into most of the models so that people can’t do wha­te­ver they want, howe­ver they’re rela­ti­ve­ly easy to cir­cumvent. And some open-source models are avai­lable in their “raw” state with no guar­drails. We can expect that open source GenAI will conti­nue to grow. This goes hand in hand with the push to deve­lop smal­ler, more sus­tai­nable, models that don’t require hun­dreds of mil­lions of dol­lars to run.

What issues can there be expected in terms of misuse of such new technology ? 

Cyber­se­cu­ri­ty will conti­nue to be a huge issue. Cri­mi­nal orga­ni­sa­tions are qui­ck­ly lear­ning to har­ness this tech­no­lo­gy for nefa­rious pur­poses. They have alrea­dy star­ted using gene­ra­tive AI to stream­line online sur­veillance, to mine his­to­ri­cal data for vul­ne­ra­bi­li­ties, or to auto­mate attacks with fake texts. Scam­mers are also using deep fakes to swindle money out of com­pa­nies. The Hong Kong police recent­ly made six arrests4 in rela­tion to an ela­bo­rate scam that rob­bed UK engi­nee­ring firm Arup5 of $25 mil­lion. One of the company’s wor­kers was pul­led into a video confe­rence call with what he thought was his chief finan­cial offi­cer. This tur­ned out to be a deep­fake video. Deep fakes are also tar­ge­ting voters’ inten­tions with mis­in­for­ma­tion. It’s a very dan­ge­rous trend and real threat this year, 2024 run­ning the most elec­tions that humans have ever held in our history.

While cyber scam­mers will conti­nue to improve, defen­ders on the other side are also lear­ning, using gene­ra­tive AI and other forms of AI to find atta­ckers. The­re’s this constant cycle of attack and defence in the cyber­se­cu­ri­ty world.

There is also a big dis­cus­sion around the use of AI in a mili­ta­ry context. AI is alrea­dy used to ana­lyse satel­lite ima­ge­ry or pro­vide navi­ga­tion for drones, but it is not yet known to be used to take human life. At this point, it’s still chea­per not to put AI on drones even if it is tech­ni­cal­ly fea­sible. And that’s a very impor­tant line, in my view, not to cross. We don’t want to enter a world where you have been figh­ting a machine speed and your adver­sa­ry is an AI – it’s then a short step to the dys­to­pian worlds of James Cameron’s Ter­mi­na­tor movies or the Wachows­ki brothers’s Matrix series.

We are seeing some movement from regulatory bodies, where do you expect that to go ?

There is regu­la­tion star­ting to emerge. The Euro­pean Union AI Act came into force6 in August 2023, with the details being fina­li­sed in April this year. Eve­ryone will be wat­ching what impact the EU legis­la­tion has. A US pre­si­den­tial order was publi­shed7 in Octo­ber intro­du­ced a long list of controls, inclu­ding sta­tu­to­ry repor­ting above a cer­tain level of com­pu­ting and net­wor­king power. We can expect more legis­la­tion to come out of the US, UK and other coun­tries soon.

Science fic­tion has a dis­tur­bing habit of beco­ming science fact.

Still, unless you hold those deve­lo­ping AI accoun­table, that regu­la­tion will only go so far. At the moment, it’s free rei­gn. If the tech­no­lo­gy puts mil­lions of people out of jobs or causes a men­tal health epi­de­mic, cor­po­ra­tions can shrug their shoul­ders and say they don’t control how people use the tech­no­lo­gy. On the other hand, if large cor­po­rates are the only orga­ni­sa­tions willing or able to invest the tens of bil­lions neces­sa­ry to deve­lop these AI sys­tems, nobo­dy wants to stall this and risk fal­ling behind other countries.

We need legis­la­tion and regu­la­tion where orga­ni­sa­tions and indi­vi­duals are accoun­table for the impact of their tech­no­lo­gies. That would make them think care­ful­ly about how their tech­no­lo­gy is going to be used and put the onus on them to pro­per­ly explore and test the impact of their tech­no­lo­gy. You can see this is an area of ten­sion for some of the GenAI com­pa­nies, for example, Ope­nAI has lost seve­ral lea­ding people8 from their com­pa­ny, each of whom is hin­ting at the lack of over­sight in GenAI development.

Anything else we should be looking out for ?

There are advances that are over the hori­zon, but you can see that they will come. And those will be very signi­fi­cant. I think the conver­gence of quan­tum com­pu­ting and AI will be inter­es­ting. Some com­pa­nies like IBM are now brin­ging for­ward their road­maps on quan­tum com­pu­ting. IBM is now fore­sha­do­wing 200 qubits and 100 mil­lion com­pu­ting gates by 20299. That is very power­ful tech­no­lo­gy that may allow AI to learn in real-time and that gets real­ly exciting.

Over the past 12 months or so, people have been applying the large lan­guage model approach to robo­tics, so-cal­led Vision Lan­guage Action models, or VLA. In the same way that we’ve built foun­da­tion models for text and images, we may be able to build them for robo­tic per­cep­tion, action, and move­ment. These aim to get to a place where, for ins­tance, you can tell a robot to pick up a bana­na and it’s got enough gene­ral know­ledge to not only spot the bana­na with its sen­sor but figure out what to do with it, without requi­ring spe­ci­fic algo­rith­mic input. It’s quite an inter­es­ting advan­ce­ment in robo­tics because it also allows the AI to learn from phy­si­cal and real-world experience.

AI men­tors could be ano­ther big thing. AIs are alrea­dy being used to gene­rate lear­ning mate­rial, but you can ima­gine a world where an AI scans your CV, and is able to sug­gest trai­ning, rea­ding mate­rial, and so on. AIs could also act as tutors, gui­ding you through edu­ca­tion, sug­ges­ting ways of lear­ning, doing exams and assess­ments, and fol­lo­wing your deve­lop­ment. Schools are alrea­dy pilo­ting the use of GenAI as tutors, for example, David Game Col­lege in Lon­don10 is trial­ling an acce­le­ra­ted GCSE in which stu­dents are only taught by AI. You’re get­ting into and chan­ging the entire edu­ca­tio­nal loop then.

The ques­tion might then be : why you would go to uni­ver­si­ty ? Why would you even go to school, apart from its social bene­fits ? It could fun­da­men­tal­ly change how we learn and teach. Some may be concer­ned that we start to build new edu­ca­tion sys­tems that are dependent on US tech com­pa­nies, rather than in-coun­try qua­li­fied human beings.

What kind of timescale are we thinking of for these advancements ?

I think if we’ve lear­ned any­thing from the last couple of years, it’s that things can hap­pen real­ly fast. Things are never as far-fet­ched as we would ima­gine it to be – Science fic­tion has a dis­tur­bing habit of beco­ming science fact. I would say much of it is dis­tur­bin­gly close.

Now we need to start thin­king the conse­quences of this. What is huma­ni­ty’s role in this future ? What do eco­no­mies look like if humans are taken out of the equa­tion ? What does truth and demo­cra­cy look like when any­thing can be faked ? What does edu­ca­tion, the foun­da­tion of our modern qua­li­ty of life, look like in the future ? These are very big, fun­da­men­tal ques­tions that I think no one has the ans­wer to at the moment.

Interview by Marianne Guenot
1https://​www​.cbs​news​.com/​n​e​w​s​/​a​i​r​c​a​n​a​d​a​-​c​h​a​t​b​o​t​-​d​i​s​c​o​u​n​t​-​c​u​s​t​omer/
2https://​research​.ibm​.com/​b​l​o​g​/​n​o​r​t​h​p​o​l​e​-​i​b​m​-​a​i​-chip
3https://​spec​trum​.ieee​.org/​n​e​u​r​o​m​o​r​p​h​i​c​-​c​o​m​p​u​t​i​n​g​-​i​b​m​-​n​o​r​t​hpole
4https://​edi​tion​.cnn​.com/​2​0​2​4​/​0​2​/​0​4​/​a​s​i​a​/​d​e​e​p​f​a​k​e​-​c​f​o​-​s​c​a​m​-​h​o​n​g​-​k​o​n​g​-​i​n​t​l​-​h​n​k​/​i​n​d​e​x​.html
5https://​www​.ft​.com/​c​o​n​t​e​n​t​/​b​9​7​7​e​8​d​4​-​6​6​4​c​-​4​a​e​4​-​8​a​8​e​-​e​b​9​3​b​d​f​785ea
6https://commission.europa.eu/news/ai-act-enters-force-2024–08-01_en#:~:text=On%201%20August%202024%2C%20the,and%20deployment%20in%20the%20EU.
7https://​www​.whi​te​house​.gov/​b​r​i​e​f​i​n​g​-​r​o​o​m​/​p​r​e​s​i​d​e​n​t​i​a​l​-​a​c​t​i​o​n​s​/​2​0​2​3​/​1​0​/​3​0​/​e​x​e​c​u​t​i​v​e​-​o​r​d​e​r​-​o​n​-​t​h​e​-​s​a​f​e​-​s​e​c​u​r​e​-​a​n​d​-​t​r​u​s​t​w​o​r​t​h​y​-​d​e​v​e​l​o​p​m​e​n​t​-​a​n​d​-​u​s​e​-​o​f​-​a​r​t​i​f​i​c​i​a​l​-​i​n​t​e​l​l​i​g​ence/
8https://www.google.com/url?q=https://www.ft.com/content/638f67f7-5375–47fc-b3a7-af7c9e05b9e0&sa=D&source=docs&ust=1723465545920218&usg=AOvVaw2FCcNq-6E4kxQIIipdSSuh
9https://​www​.ibm​.com/​r​o​a​d​m​a​p​s​/​q​u​a​n​t​u​m.pdf
10https://​www​.bbc​.co​.uk/​s​o​u​n​d​s​/​p​l​a​y​/​m​0​0​21x2v

Support accurate information rooted in the scientific method.

Donate