Male doctor showing realistic graph data of health care and health insurance business and growth with medical and medical financial analysis.Generative AI
π Health and biotech π Science and technology
Digital innovations for better health

How we can better predict future epidemics

Etienne Minvielle, CNRS Research Director and Professor of Health Management at Ecole Polytechnique (IP Paris) and Antoine Flahault, PhD in biomathematics
On January 10th, 2024 |
4 min reading time
Etienne Minvielle
Etienne Minvielle
CNRS Research Director and Professor of Health Management at Ecole Polytechnique (IP Paris)
Antoine Flahault
Antoine Flahault
PhD in biomathematics
Key takeaways
  • Since the 20th Century, various types of model have been used to predict the risk of epidemics, and have proved their effectiveness.
  • Big Data has enabled these models to evolve, now used to anticipate epidemics more effectively, so that humanitarian aid can be concentrated in the area at risk, at the key moment.
  • However, a number of challenges remain. Adapting predictions to local contexts, and the transition from prediction to action, which is hampered by socio-economic factors.
  • Combining Big Data processing, epidemiological expertise and algorithmic processing would multiply the potential of these models.

Epi­dem­ic risk pre­dic­tion mod­els help to analyse the tem­po­ral and geo­graph­i­cal evo­lu­tion of epi­demics. They exist­ed well before dig­i­tal algo­rithms. How­ev­er, since the advent of Big Data, these mod­els have evolved con­sid­er­ably, rais­ing sev­er­al ques­tions. How reli­able is the pre­dic­tion? How can we assess our abil­i­ty to col­lect data? What is the role of these mod­els when it comes to tak­ing action?

Predictive models: before and after Big Data

Since the 20th Cen­tu­ry, many mod­els have been devel­oped and have proved their worth. The SIR math­e­mat­i­cal mod­el, cre­at­ed in 1927, forms the basis of most epi­demi­o­log­i­cal mod­els. It is based on flows between the com­part­ments of the sus­cep­ti­ble (S), the infected/contagious (I) and those removed from the trans­mis­sion chain (R) – i.e. peo­ple who have been immu­nised or have died1.

An infec­tion is said to be an epi­dem­ic when the num­ber of suf­fer­ers increas­es over time, i.e. when the num­ber of new R0 infec­tions is pos­i­tive – as we saw dur­ing the COVID-19 cri­sis. In short, this means that each case on its own gen­er­ates at least one case.

At the same time, oth­er mod­els – ARIMA2 and SARIMA3, for exam­ple – are not based on the SIR mod­el, but on a “time series”. They assume that what hap­pened in a pre­vi­ous series will hap­pen in future episodes. These mod­els are effec­tive for sea­son­al events such as influenza.

With the emer­gence of Big Data, new pre­dic­tive mod­els have appeared. These can be used to antic­i­pate epi­demics so that human­i­tar­i­an aid can be con­cen­trat­ed in the area at risk, at the key moment4. In recent years, sev­er­al prac­ti­cal appli­ca­tions have proven their worth. For exam­ple, to com­bat Ebo­la in Africa, Médecins Sans Fron­tières has built health cen­tres in areas of high traf­fic flows, iden­ti­fied using data from tele­phone oper­a­tors5. Cap­tur­ing a new kind of data in large num­bers opens new treat­ment pos­si­bil­i­ties. In this respect, algo­rith­mic pre­ven­tion is effective.

Unpredictability and action: the challenges of these Big Data models

At present, it seems that mod­els pre­dict more for the short or imme­di­ate future than for the longer term. None of the recent dis­eases (COVID-19, Zika, West Nile, Chikun­gun­ya) has ever been pre­dict­ed – we always caught up once it was already there. When we try to pre­dict the risk of an epi­dem­ic occur­ring, mod­els tend to over­es­ti­mate risk.

In Jan­u­ary 2013, for exam­ple, the Google Flu Trends inter­face pre­dict­ed – wrong­ly – a seri­ous flu epi­dem­ic in New York. Based on this pre­dic­tion, large-scale pre­ven­tive mea­sures were launched, which then proved to be com­plete­ly use­less. Sim­i­lar­ly, the CDC in Atlanta (the US Cen­tre for Dis­ease Con­trol and Pre­ven­tion) pre­dict­ed that there would be over a mil­lion cas­es of Ebo­la in Liberia, but for­tu­nate­ly there were only a few tens of thou­sands of cases.

On the oth­er hand, mod­els are effec­tive for track­ing the devel­op­ment of epi­demics over the short term. Google Flu Trends has demon­strat­ed this on many occa­sions. Anoth­er exam­ple: dur­ing the COVID-19 pan­dem­ic, Google Ver­i­ly, the Uni­ver­si­ty of Gene­va and the École poly­tech­nique de Lau­sanne-Zurich were able to pre­dict short-term epi­dem­ic waves.

The oth­er prob­lem with mod­els is the rela­tion­ship between results and action. On the one hand, a mod­el pro­duced on a nation­al scale does not nec­es­sar­i­ly have suf­fi­cient pow­er to assess a local sit­u­a­tion. Dur­ing the COVID-19 pan­dem­ic, for exam­ple, local mod­els were devel­oped in Mar­tinique in addi­tion to the nation­al Pas­teur mod­el. This very sim­ple mod­el fore­cast the num­ber of COVID beds need­ed over 14 days if there was no con­tain­ment. Dur­ing the 4th wave (the largest), the mod­el pre­dict­ed that 700 COVID beds would be need­ed, prov­ing to be fair­ly reli­able since 600 beds were actu­al­ly used. This mod­el was effec­tive in antic­i­pat­ing the impact on day hos­pi­tal ser­vices for chron­ic dis­eases and in open­ing beds accord­ing­ly, demon­strat­ing the need to sup­ple­ment glob­al analy­ses with oth­ers that are locat­ed and adapt­ed to the local context.

On the oth­er hand, regard­less of the mod­el cho­sen, its reli­a­bil­i­ty and its adap­ta­tion to a local con­text, pre­dic­tion alone can­not gov­ern action. The COVID-19 pan­dem­ic showed that many peo­ple were reluc­tant to be vac­ci­nat­ed, with vary­ing pro­files and moti­va­tions. In Chi­na, elder­ly peo­ple were dis­cour­aged by doc­tors who cit­ed their frag­ile health. In the case of the African-Amer­i­can and West Indi­an pop­u­la­tions, it was rather a lack of con­fi­dence in the West­ern pow­ers that appeared to be the rea­son for resis­tance to vac­ci­na­tion. There were there­fore many rea­sons for the reluc­tance to vac­ci­nate, and these were inde­pen­dent of the ques­tion of prediction.

Gen­er­al­ly speak­ing, these chal­lenges reveal that the tran­si­tion from pre­dic­tion to pre­ven­tive action is not lin­ear and sequen­tial. Oth­er socio-eco­nom­ic fac­tors come into play, under­lin­ing the impor­tance of plac­ing pre­dic­tive mod­els in the con­text of their use.

The future of prediction: towards multidimensional, unified integration?

The con­tri­bu­tion of Big Data seems like­ly to improve mat­ters. Mul­ti-lev­el pre­dic­tive mod­els could be devel­oped by com­bin­ing more epi­demi­o­log­i­cal exper­tise, Big Data and algo­rith­mic pro­cess­ing. Such mod­el­ling, in which each lay­er could con­tribute to accu­ra­cy, con­cerns a vari­ety of data: satel­lite imagery, bio­log­i­cal data, eco­nom­ic and social data, health mon­i­tor­ing, etc.

This pre­sup­pos­es more dynam­ic data col­lec­tion and shar­ing. In this respect, the uptake of data in France dur­ing the last Covid cri­sis showed that the approach is not yet spon­ta­neous. It would have been – and still is – desir­able to set up a uni­fied data ware­house, so that experts can draw on it for the data they need. To achieve this, we need to learn how to organ­ise the shar­ing of exist­ing data. This is a major chal­lenge if we are to make progress in the algo­rith­mic pre­ven­tion of epi­dem­ic risks.

Interview by Loraine Odot
1Weiss HH (2013) The SIR mod­el and the foun­da­tions of pub­lic health. Mater Mat 2013(3):1–17
2Singh RK, Rani M, Bha­ga­vathu­la AS, Sah R, Rodriguez-Morales AJ, Kali­ta H, et al. Pre­dic­tion of the COVID-19 Pan­dem­ic for the Top 15 Affect­ed Coun­tries: Advanced Autore­gres­sive Inte­grat­ed Mov­ing Aver­age (ARIMA) Mod­el. JMIR Pub­lic Health Sur­veill 2020 May 13;6(2):e19115
3Per­one, G. Using the SARIMA mod­el to fore­cast the fourth glob­al wave of cumu­la­tive deaths from COVID-19: Evi­dence from 12 hard-hit big coun­tries. Econo­met­rics 10(2), 18 (2022)
4Col­ston JM ‚Ahmed T, Mahopo C et al Eval­u­at­ing mete­o­ro­log­i­cal data from weath­er sta­tions, and from satel­lites and glob­al mod­els for a mul­ti-site epi­demi­o­log­i­cal study. Env­i­ron Res. 2018; 165: 91–109
5Brinkel J, Kramer A, Krumkamp R et al. Mobile phone-based mHealth approach­es for pub­lic health sur­veil­lance in sub-Saha­ran Africa: a sys­tem­at­ic review. Int J Env­i­ron Res Pub­lic Healthc2014 ; 11 : 11559–11582.

Our world explained with science. Every week, in your inbox.

Get the newsletter