Male doctor showing realistic graph data of health care and health insurance business and growth with medical and medical financial analysis.Generative AI
π Health and biotech π Science and technology
Digital innovations for better health

How we can better predict future epidemics

with Etienne Minvielle, Director of the Centre de Recherche en Gestion at Ecole Polytechnique (IP Paris) and Antoine Flahault, PhD in biomathematics
On January 10th, 2024 |
4 min reading time
Etienne Minvielle
Etienne Minvielle
Director of the Centre de Recherche en Gestion at Ecole Polytechnique (IP Paris)
Antoine Flahault
Antoine Flahault
PhD in biomathematics
Key takeaways
  • Since the 20th Century, various types of model have been used to predict the risk of epidemics, and have proved their effectiveness.
  • Big Data has enabled these models to evolve, now used to anticipate epidemics more effectively, so that humanitarian aid can be concentrated in the area at risk, at the key moment.
  • However, a number of challenges remain. Adapting predictions to local contexts, and the transition from prediction to action, which is hampered by socio-economic factors.
  • Combining Big Data processing, epidemiological expertise and algorithmic processing would multiply the potential of these models.

Epi­dem­ic risk pre­dic­tion mod­els help to ana­lyse the tem­por­al and geo­graph­ic­al evol­u­tion of epi­dem­ics. They exis­ted well before digit­al algorithms. How­ever, since the advent of Big Data, these mod­els have evolved con­sid­er­ably, rais­ing sev­er­al ques­tions. How reli­able is the pre­dic­tion? How can we assess our abil­ity to col­lect data? What is the role of these mod­els when it comes to tak­ing action?

Predictive models: before and after Big Data

Since the 20th Cen­tury, many mod­els have been developed and have proved their worth. The SIR math­em­at­ic­al mod­el, cre­ated in 1927, forms the basis of most epi­demi­olo­gic­al mod­els. It is based on flows between the com­part­ments of the sus­cept­ible (S), the infected/contagious (I) and those removed from the trans­mis­sion chain (R) – i.e. people who have been immun­ised or have died1.

An infec­tion is said to be an epi­dem­ic when the num­ber of suf­fer­ers increases over time, i.e. when the num­ber of new R0 infec­tions is pos­it­ive – as we saw dur­ing the COVID-19 crisis. In short, this means that each case on its own gen­er­ates at least one case.

At the same time, oth­er mod­els – ARIMA2 and SARIMA3, for example – are not based on the SIR mod­el, but on a “time series”. They assume that what happened in a pre­vi­ous series will hap­pen in future epis­odes. These mod­els are effect­ive for sea­son­al events such as influenza.

With the emer­gence of Big Data, new pre­dict­ive mod­els have appeared. These can be used to anti­cip­ate epi­dem­ics so that human­it­ari­an aid can be con­cen­trated in the area at risk, at the key moment4. In recent years, sev­er­al prac­tic­al applic­a­tions have proven their worth. For example, to com­bat Ebola in Africa, Méde­cins Sans Frontières has built health centres in areas of high traffic flows, iden­ti­fied using data from tele­phone oper­at­ors5. Cap­tur­ing a new kind of data in large num­bers opens new treat­ment pos­sib­il­it­ies. In this respect, algorithmic pre­ven­tion is effective.

Unpredictability and action: the challenges of these Big Data models

At present, it seems that mod­els pre­dict more for the short or imme­di­ate future than for the longer term. None of the recent dis­eases (COVID-19, Zika, West Nile, Chikun­gun­ya) has ever been pre­dicted – we always caught up once it was already there. When we try to pre­dict the risk of an epi­dem­ic occur­ring, mod­els tend to over­es­tim­ate risk.

In Janu­ary 2013, for example, the Google Flu Trends inter­face pre­dicted – wrongly – a ser­i­ous flu epi­dem­ic in New York. Based on this pre­dic­tion, large-scale pre­vent­ive meas­ures were launched, which then proved to be com­pletely use­less. Sim­il­arly, the CDC in Atlanta (the US Centre for Dis­ease Con­trol and Pre­ven­tion) pre­dicted that there would be over a mil­lion cases of Ebola in Liber­ia, but for­tu­nately there were only a few tens of thou­sands of cases.

On the oth­er hand, mod­els are effect­ive for track­ing the devel­op­ment of epi­dem­ics over the short term. Google Flu Trends has demon­strated this on many occa­sions. Anoth­er example: dur­ing the COVID-19 pan­dem­ic, Google Ver­ily, the Uni­ver­sity of Geneva and the École poly­tech­nique de Lausanne-Zurich were able to pre­dict short-term epi­dem­ic waves.

The oth­er prob­lem with mod­els is the rela­tion­ship between res­ults and action. On the one hand, a mod­el pro­duced on a nation­al scale does not neces­sar­ily have suf­fi­cient power to assess a loc­al situ­ation. Dur­ing the COVID-19 pan­dem­ic, for example, loc­al mod­els were developed in Mar­ti­nique in addi­tion to the nation­al Pas­teur mod­el. This very simple mod­el fore­cast the num­ber of COVID beds needed over 14 days if there was no con­tain­ment. Dur­ing the 4th wave (the largest), the mod­el pre­dicted that 700 COVID beds would be needed, prov­ing to be fairly reli­able since 600 beds were actu­ally used. This mod­el was effect­ive in anti­cip­at­ing the impact on day hos­pit­al ser­vices for chron­ic dis­eases and in open­ing beds accord­ingly, demon­strat­ing the need to sup­ple­ment glob­al ana­lyses with oth­ers that are loc­ated and adap­ted to the loc­al context.

On the oth­er hand, regard­less of the mod­el chosen, its reli­ab­il­ity and its adapt­a­tion to a loc­al con­text, pre­dic­tion alone can­not gov­ern action. The COVID-19 pan­dem­ic showed that many people were reluct­ant to be vac­cin­ated, with vary­ing pro­files and motiv­a­tions. In China, eld­erly people were dis­cour­aged by doc­tors who cited their fra­gile health. In the case of the Afric­an-Amer­ic­an and West Indi­an pop­u­la­tions, it was rather a lack of con­fid­ence in the West­ern powers that appeared to be the reas­on for res­ist­ance to vac­cin­a­tion. There were there­fore many reas­ons for the reluct­ance to vac­cin­ate, and these were inde­pend­ent of the ques­tion of prediction.

Gen­er­ally speak­ing, these chal­lenges reveal that the trans­ition from pre­dic­tion to pre­vent­ive action is not lin­ear and sequen­tial. Oth­er socio-eco­nom­ic factors come into play, under­lin­ing the import­ance of pla­cing pre­dict­ive mod­els in the con­text of their use.

The future of prediction: towards multidimensional, unified integration?

The con­tri­bu­tion of Big Data seems likely to improve mat­ters. Multi-level pre­dict­ive mod­els could be developed by com­bin­ing more epi­demi­olo­gic­al expert­ise, Big Data and algorithmic pro­cessing. Such mod­el­ling, in which each lay­er could con­trib­ute to accur­acy, con­cerns a vari­ety of data: satel­lite imagery, bio­lo­gic­al data, eco­nom­ic and social data, health mon­it­or­ing, etc.

This pre­sup­poses more dynam­ic data col­lec­tion and shar­ing. In this respect, the uptake of data in France dur­ing the last Cov­id crisis showed that the approach is not yet spon­tan­eous. It would have been – and still is – desir­able to set up a uni­fied data ware­house, so that experts can draw on it for the data they need. To achieve this, we need to learn how to organ­ise the shar­ing of exist­ing data. This is a major chal­lenge if we are to make pro­gress in the algorithmic pre­ven­tion of epi­dem­ic risks.

Interview by Loraine Odot
1Weiss HH (2013) The SIR mod­el and the found­a­tions of pub­lic health. Mater Mat 2013(3):1–17
2Singh RK, Rani M, Bhagavathula AS, Sah R, Rodrig­uez-Mor­ales AJ, Kal­ita H, et al. Pre­dic­tion of the COVID-19 Pan­dem­ic for the Top 15 Affected Coun­tries: Advanced Autore­gress­ive Integ­rated Mov­ing Aver­age (ARIMA) Mod­el. JMIR Pub­lic Health Sur­veill 2020 May 13;6(2):e19115
3Per­one, G. Using the SARIMA mod­el to fore­cast the fourth glob­al wave of cumu­lat­ive deaths from COVID-19: Evid­ence from 12 hard-hit big coun­tries. Eco­no­met­rics 10(2), 18 (2022)
4Col­ston JM ‚Ahmed T, Mahopo C et al Eval­u­at­ing met­eor­o­lo­gic­al data from weath­er sta­tions, and from satel­lites and glob­al mod­els for a multi-site epi­demi­olo­gic­al study. Environ Res. 2018; 165: 91–109
5Brinkel J, Kramer A, Krumkamp R et al. Mobile phone-based mHealth approaches for pub­lic health sur­veil­lance in sub-Saha­ran Africa: a sys­tem­at­ic review. Int J Environ Res Pub­lic Healthc2014 ; 11 : 11559–11582.

Support accurate information rooted in the scientific method.

Donate