Home / Chroniques / Generative AI: what are the next steps?
Eye of futuristic and Innovative Imagery AI and Automation use of artificial intelligence and automation in business processes, illustrating efficiency and productivity enhancements
π Science and technology π Digital

Generative AI: what are the next steps?

Andrew Rogoyski
Andrew Rogoyski
Innovation Director for the Surrey Institute for People-Centred AI
Key takeaways
  • AI is moving forward a breakneck speed, and its pace of development is unlikely to slow.
  • Some developments, like multimodal AI, AI agents, and AI-optimised chips are just around the corner.
  • However, the development of AI is not yet profitable and is dominated by a few large commercial organisations.
  • Bigger leaps like AI-powered robots and AI mentors are further away, but likely to happen.
  • With these developments, regulatory bodies need to keep up.

AI has been a long time coming. But over the past couple of years, while it’s been in the public eye, it has seemed to advance at warp speed. Andrew Rogoyski shares his insights into what to expect next. What powerful new features can we expect are just over the hill for AI?

We should explain that when we use the term “AI”, we’re cur­rently mostly focus­ing this dis­cus­sion on “Gen­er­at­ive AI” or “GenAI” which plat­forms like OpenAI’s Chat-GPT have brought to the world in the last two years. Fur­ther big advance­ments, being pushed for­ward by act­ors all around the world, are likely to come out soon. These already have a roadmap.

One of these is AI becom­ing increas­ingly mul­timod­al. That means that large lan­guage mod­els (LLMs) will learn and under­stand text, video, and sound and how they relate to each oth­er. Some mod­els are already breach­ing that bar­ri­er and reach­ing the mar­kets. Single mode AIs like Copi­lot can gen­er­ate images from text and vice versa. Sora can gen­er­ate video from text. Run­way and Pika Labs are also offer­ing image-to-video gen­er­a­tion. The new­er Large Mul­timod­al Mod­els (LMMs) from OpenAI, Meta, Google and oth­ers can gen­er­ate video from an image, text, and oth­er data modes. For example, some GenAI mod­els will answer text ques­tions about the con­tent of videos. Many indus­tries are being affected with stu­di­os in Hol­ly­wood rap­idly assess­ing what this could mean for the movie industry. One of the down­sides of this power­ful tech­no­logy is that you can cre­ate fairly intric­ate deep­fakes with smal­ler budgets.

Anoth­er big, expec­ted advance will be AI becom­ing an invis­ible tool. Instead of hav­ing to log on to a ded­ic­ated plat­form on a com­puter or phone, we’ll be able to con­verse with our cars, phones, and appli­ances, and get very nat­ur­al answers. There are sev­er­al com­pan­ies work­ing on this, Apple with Apple intel­li­gence, Google with Google AI, Amazon with Alexa, and others.

The next step then is hav­ing AI act as a sort of agent on your behalf, allow­ing it to book in trips, hotel stays and so on. At this point, GenAI isn’t very good at plan­ning. That’s what OpenAI and oth­ers are work­ing on, get­ting GenAI that can break down a prob­lem into steps and take action on those steps. The ques­tion is, then how much author­ity do you give an agent to act on your behalf? It seems likely that such agents will be inter­act­ing with oth­er agents, lead­ing to entire AI dis­cus­sions and nego­ti­ations tak­ing place without human intervention.

Anoth­er fairly big devel­op­ment will be the improve­ment of AI retriev­al. That may sound bor­ing, but it’s really excit­ing in terms of pro­ductiv­ity. Cor­por­a­tions col­lect thou­sands of doc­u­ments con­tain­ing cus­tom­er inter­ac­tions, bids, policies, pro­ced­ures, and oth­er use­ful inform­a­tion. How­ever, retriev­al of such inform­a­tion is gen­er­ally poor. GenAI may be the solu­tion to the cor­por­ate “know­ledge man­age­ment” prob­lem. Would­n’t it be won­der­ful to be able to ask your laptop: “What was that big bid we did three years ago where we partnered with that bank?” and have it infer the right answers and give you a sum­mary rather than a string of doc­u­ments you have to read through?

Of course, before we can do this, we need to tackle AI hal­lu­cin­a­tion, which is the false inform­a­tion gen­er­ated by AI.. We have developed a tech­no­logy that will hal­lu­cin­ate images, sounds, poetry, and so on. But we are less keen on it hal­lu­cin­at­ing the com­pany accounts or a med­ic­al record. The trick now will be to take that really nice con­ver­sa­tion­al inter­face and link it to hard facts. Gen­er­at­ive AI can cre­ate non­sense, which can be a big prob­lem. Recently, Air Canada faced a small claims court case1 from a pas­sen­ger who tried to ret­ro­act­ively apply for a refund on his tick­et after check­ing the company’s bereave­ment policy on their AI-powered chat­bot. The AI hal­lu­cin­ated that pas­sen­gers could claim back money with­in 90 days of travel, which isn’t in the company’s policy. The court sided with the passenger.

Part of the move forward with AI will be limiting its cost, right?

Yes, cost of run­ning these mod­els today, in terms of energy, cool­ing, and com­put­ing power, makes them unsus­tain­able, both com­mer­cially and in the con­text of the cli­mate crisis. Com­pan­ies are likely to move from the exist­ing graph­ics pro­cessing units (GPUs) to hard­ware designed around AI applications.

Apple have a “neur­al pro­cessing unit”, Google have a “Tensor pro­cessing unit”, Microsoft, IBM, Amazon, Sam­sung and oth­ers are all devel­op­ing spe­cial­ised hard­ware that can deliv­er per­form­ance improve­ments hun­dreds or thou­sands of times more effi­cient than GPUs and CPUs. These chips are massively par­al­lel and optim­ised for the mat­rix oper­a­tions at the heart of machine learn­ing algorithms.

New chip archi­tec­tures are also being pro­posed to run these mod­els with very low energy. That’s the case for IBM’s North Pole AI chip2, for instance, which prom­ises to reduce the power for typ­ic­al applic­a­tions by a factor of 253. Google is also work­ing on its Tensor Pro­cessing Unit to accel­er­ate AI pro­cessing and Groq’s Lan­guage Pro­cessing Unit is also show­ing promise.

Then there are more eso­ter­ic archi­tec­tures, such as the neur­omorph­ic chips. These are designed to sup­port so-called spik­ing neur­al net­works — com­put­ing mod­els that mim­ic the way human brains work. Those are mostly in the aca­dem­ic domain at the moment, but they are start­ing to move into oth­er areas.

What about the fact that AI is that it is so heavily dominated by a few commercial entities at the moment?

Cur­rently, there is a big debate about open­ing up LLMs to open source. Due to the scale of oper­a­tions needed to devel­op LLMs and LMMs, com­mer­cial organ­isa­tions have been very much at the fore­front of devel­op­ment. Around 80–90% of them are developed by com­mer­cial organ­isa­tions. That means that the tech­no­logy has remained mostly in the hands of its proprietors—with some not­able excep­tions like Meta’s LLaMA and Mistral’s Large and Codestral, which were made open source early on. There are also open-source com­munity LLM/LMMs like Platy­pus, Bloom, and Falcon.

On the one hand, more people exper­i­ment­ing and play­ing with the tech­no­logy could trig­ger new advances, expose vul­ner­ab­il­it­ies, and so on. On the oth­er hand, there are people who will mis­use that tech­no­logy. There are cur­rently fail-safes built into most of the mod­els so that people can’t do whatever they want, how­ever they’re rel­at­ively easy to cir­cum­vent. And some open-source mod­els are avail­able in their “raw” state with no guard­rails. We can expect that open source GenAI will con­tin­ue to grow. This goes hand in hand with the push to devel­op smal­ler, more sus­tain­able, mod­els that don’t require hun­dreds of mil­lions of dol­lars to run.

What issues can there be expected in terms of misuse of such new technology? 

Cyber­se­cur­ity will con­tin­ue to be a huge issue. Crim­in­al organ­isa­tions are quickly learn­ing to har­ness this tech­no­logy for nefar­i­ous pur­poses. They have already star­ted using gen­er­at­ive AI to stream­line online sur­veil­lance, to mine his­tor­ic­al data for vul­ner­ab­il­it­ies, or to auto­mate attacks with fake texts. Scam­mers are also using deep fakes to swindle money out of com­pan­ies. The Hong Kong police recently made six arrests4 in rela­tion to an elab­or­ate scam that robbed UK engin­eer­ing firm Arup5 of $25 mil­lion. One of the company’s work­ers was pulled into a video con­fer­ence call with what he thought was his chief fin­an­cial officer. This turned out to be a deep­fake video. Deep fakes are also tar­get­ing voters’ inten­tions with mis­in­form­a­tion. It’s a very dan­ger­ous trend and real threat this year, 2024 run­ning the most elec­tions that humans have ever held in our history.

While cyber scam­mers will con­tin­ue to improve, defend­ers on the oth­er side are also learn­ing, using gen­er­at­ive AI and oth­er forms of AI to find attack­ers. There’s this con­stant cycle of attack and defence in the cyber­se­cur­ity world.

There is also a big dis­cus­sion around the use of AI in a mil­it­ary con­text. AI is already used to ana­lyse satel­lite imagery or provide nav­ig­a­tion for drones, but it is not yet known to be used to take human life. At this point, it’s still cheap­er not to put AI on drones even if it is tech­nic­ally feas­ible. And that’s a very import­ant line, in my view, not to cross. We don’t want to enter a world where you have been fight­ing a machine speed and your adversary is an AI – it’s then a short step to the dysto­pi­an worlds of James Cameron’s Ter­min­at­or movies or the Wachow­ski brothers’s Mat­rix series.

We are seeing some movement from regulatory bodies, where do you expect that to go?

There is reg­u­la­tion start­ing to emerge. The European Uni­on AI Act came into force6 in August 2023, with the details being final­ised in April this year. Every­one will be watch­ing what impact the EU legis­la­tion has. A US pres­id­en­tial order was pub­lished7 in Octo­ber intro­duced a long list of con­trols, includ­ing stat­utory report­ing above a cer­tain level of com­put­ing and net­work­ing power. We can expect more legis­la­tion to come out of the US, UK and oth­er coun­tries soon.

Sci­ence fic­tion has a dis­turb­ing habit of becom­ing sci­ence fact.

Still, unless you hold those devel­op­ing AI account­able, that reg­u­la­tion will only go so far. At the moment, it’s free reign. If the tech­no­logy puts mil­lions of people out of jobs or causes a men­tal health epi­dem­ic, cor­por­a­tions can shrug their shoulders and say they don’t con­trol how people use the tech­no­logy. On the oth­er hand, if large cor­por­ates are the only organ­isa­tions will­ing or able to invest the tens of bil­lions neces­sary to devel­op these AI sys­tems, nobody wants to stall this and risk fall­ing behind oth­er countries.

We need legis­la­tion and reg­u­la­tion where organ­isa­tions and indi­vidu­als are account­able for the impact of their tech­no­lo­gies. That would make them think care­fully about how their tech­no­logy is going to be used and put the onus on them to prop­erly explore and test the impact of their tech­no­logy. You can see this is an area of ten­sion for some of the GenAI com­pan­ies, for example, OpenAI has lost sev­er­al lead­ing people8 from their com­pany, each of whom is hint­ing at the lack of over­sight in GenAI development.

Anything else we should be looking out for?

There are advances that are over the hori­zon, but you can see that they will come. And those will be very sig­ni­fic­ant. I think the con­ver­gence of quantum com­put­ing and AI will be inter­est­ing. Some com­pan­ies like IBM are now bring­ing for­ward their roadmaps on quantum com­put­ing. IBM is now fore­shad­ow­ing 200 qubits and 100 mil­lion com­put­ing gates by 20299. That is very power­ful tech­no­logy that may allow AI to learn in real-time and that gets really exciting.

Over the past 12 months or so, people have been apply­ing the large lan­guage mod­el approach to robot­ics, so-called Vis­ion Lan­guage Action mod­els, or VLA. In the same way that we’ve built found­a­tion mod­els for text and images, we may be able to build them for robot­ic per­cep­tion, action, and move­ment. These aim to get to a place where, for instance, you can tell a robot to pick up a banana and it’s got enough gen­er­al know­ledge to not only spot the banana with its sensor but fig­ure out what to do with it, without requir­ing spe­cif­ic algorithmic input. It’s quite an inter­est­ing advance­ment in robot­ics because it also allows the AI to learn from phys­ic­al and real-world experience.

AI ment­ors could be anoth­er big thing. AIs are already being used to gen­er­ate learn­ing mater­i­al, but you can ima­gine a world where an AI scans your CV, and is able to sug­gest train­ing, read­ing mater­i­al, and so on. AIs could also act as tutors, guid­ing you through edu­ca­tion, sug­gest­ing ways of learn­ing, doing exams and assess­ments, and fol­low­ing your devel­op­ment. Schools are already pilot­ing the use of GenAI as tutors, for example, Dav­id Game Col­lege in Lon­don10 is tri­al­ling an accel­er­ated GCSE in which stu­dents are only taught by AI. You’re get­ting into and chan­ging the entire edu­ca­tion­al loop then.

The ques­tion might then be: why you would go to uni­ver­sity? Why would you even go to school, apart from its social bene­fits? It could fun­da­ment­ally change how we learn and teach. Some may be con­cerned that we start to build new edu­ca­tion sys­tems that are depend­ent on US tech com­pan­ies, rather than in-coun­try qual­i­fied human beings.

What kind of timescale are we thinking of for these advancements?

I think if we’ve learned any­thing from the last couple of years, it’s that things can hap­pen really fast. Things are nev­er as far-fetched as we would ima­gine it to be – Sci­ence fic­tion has a dis­turb­ing habit of becom­ing sci­ence fact. I would say much of it is dis­turb­ingly close.

Now we need to start think­ing the con­sequences of this. What is human­ity’s role in this future? What do eco­nom­ies look like if humans are taken out of the equa­tion? What does truth and demo­cracy look like when any­thing can be faked? What does edu­ca­tion, the found­a­tion of our mod­ern qual­ity of life, look like in the future? These are very big, fun­da­ment­al ques­tions that I think no one has the answer to at the moment.

Interview by Marianne Guenot
1https://​www​.cbsnews​.com/​n​e​w​s​/​a​i​r​c​a​n​a​d​a​-​c​h​a​t​b​o​t​-​d​i​s​c​o​u​n​t​-​c​u​s​t​omer/
2https://​research​.ibm​.com/​b​l​o​g​/​n​o​r​t​h​p​o​l​e​-​i​b​m​-​a​i​-chip
3https://​spec​trum​.ieee​.org/​n​e​u​r​o​m​o​r​p​h​i​c​-​c​o​m​p​u​t​i​n​g​-​i​b​m​-​n​o​r​t​hpole
4https://​edi​tion​.cnn​.com/​2​0​2​4​/​0​2​/​0​4​/​a​s​i​a​/​d​e​e​p​f​a​k​e​-​c​f​o​-​s​c​a​m​-​h​o​n​g​-​k​o​n​g​-​i​n​t​l​-​h​n​k​/​i​n​d​e​x​.html
5https://​www​.ft​.com/​c​o​n​t​e​n​t​/​b​9​7​7​e​8​d​4​-​6​6​4​c​-​4​a​e​4​-​8​a​8​e​-​e​b​9​3​b​d​f​785ea
6https://commission.europa.eu/news/ai-act-enters-force-2024–08-01_en#:~:text=On%201%20August%202024%2C%20the,and%20deployment%20in%20the%20EU.
7https://​www​.white​house​.gov/​b​r​i​e​f​i​n​g​-​r​o​o​m​/​p​r​e​s​i​d​e​n​t​i​a​l​-​a​c​t​i​o​n​s​/​2​0​2​3​/​1​0​/​3​0​/​e​x​e​c​u​t​i​v​e​-​o​r​d​e​r​-​o​n​-​t​h​e​-​s​a​f​e​-​s​e​c​u​r​e​-​a​n​d​-​t​r​u​s​t​w​o​r​t​h​y​-​d​e​v​e​l​o​p​m​e​n​t​-​a​n​d​-​u​s​e​-​o​f​-​a​r​t​i​f​i​c​i​a​l​-​i​n​t​e​l​l​i​g​ence/
8https://www.google.com/url?q=https://www.ft.com/content/638f67f7-5375–47fc-b3a7-af7c9e05b9e0&sa=D&source=docs&ust=1723465545920218&usg=AOvVaw2FCcNq-6E4kxQIIipdSSuh
9https://​www​.ibm​.com/​r​o​a​d​m​a​p​s​/​q​u​a​n​t​u​m.pdf
10https://​www​.bbc​.co​.uk/​s​o​u​n​d​s​/​p​l​a​y​/​m​0​0​21x2v

Support accurate information rooted in the scientific method.

Donate