Home / Chroniques / Covid-19 death rates are higher than reported
nursing home
π Health and biotech π Society

Covid-19 death rates are higher than reported

Nicolas Chopin
Nicolas Chopin
Professor of data science and machine learning at the ENSAE Centre for Research in Economics and Statistics*

Since the begin­ning of the pan­dem­ic the world has turned its atten­tion to Cov­id-19 data. The most often cited are the counts for new cases or num­ber of deaths. Yet, neither of these fig­ures are com­pletely reli­able. Dur­ing the lock­down in March 2020, Pro­fess­or of data sci­ence and stat­ist­ics at the ENSAE, Nic­olas Chop­in took it upon him­self to look deep­er into the fig­ures provided by French author­it­ies. As oth­ers have done around the world, he found that true Cov­id-19 mor­tal­ity rates were likely to be high­er than those reported. 

Dur­ing the lock­down, you took a look at the pan­dem­ic data dif­fer­ently. What did you find? 

At the begin­ning of the pan­dem­ic every­one was talk­ing about Cov­id-19 and I wanted to be help­ful in some way. I’m a data sci­ent­ist so I sought out Open Data. Ori­gin­ally, I wanted to com­pare the evol­u­tion of the pan­dem­ic between coun­tries. But very quickly I was con­fron­ted with the real­ity that this isn’t pos­sible. Test­ing strategies var­ied enorm­ously between nations – espe­cially at the begin­ning of the pan­dem­ic – so it was obvi­ous that the num­ber of new cases was an unre­li­able comparison. 

Instead, I looked at the num­ber of deaths. Access to the data was actu­ally very easy in France because the two main stat­ist­ics insti­tutes provide pre­cise data on a daily basis – Santé Pub­lique France (SPF) and Insti­tut Nation­al de la Stat­istique et des Etudes Eco­nomiques (INSEE). As the pan­dem­ic pro­gressed, they also pro­gress­ively improved the data made available. 

Nev­er­the­less, these fig­ures didn’t quite match up either. What was miss­ing from the data? 

First, the SPF provides extens­ive amounts of data com­ing from hos­pit­als. How­ever, this is lim­ited to hos­pit­al-related inform­a­tion and so only offers data with regards to patients treated at hos­pit­als. Second, the INSEE releases exhaust­ive data on deaths that have occurred since March 2020, but there is no inform­a­tion in these data on the cause of death. So alone, the data is dif­fi­cult to inter­pret. Togeth­er, how­ever, they provide valu­able insight into the pro­gres­sion of the pan­dem­ic in France. 

I star­ted by com­par­ing the num­ber of deaths provided by the INSEE in pre­vi­ous years to those of this year. Of course, there was an increase – this is the excess mor­tal­ity rate, which shows the num­ber of extra deaths com­pared to the num­ber of expec­ted deaths. And it was 60% high­er on aver­age than the recor­ded Cov­id-19 deaths provided by the SPF. It is likely that these deaths are mostly due to Cov­id-19 deaths in eld­erly retire­ment homes. But they could be due to related causes, such as lack of treat­ment for oth­er health prob­lems due to the crisis.

Later, I ana­lysed the same data by sep­ar­at­ing men and women. Here, I found that the factor for men alone was 1.56 and women was 2.4. This makes sense as there are three times more women than men in retire­ment homes on aver­age across the coun­try. These insti­tu­tions are much less equipped to col­lect and share data, hence the dis­crep­ancy in the numbers. 

Excess mor­tal­ity versus cov­id deaths in hos­pit­al dur­ing weeks 13 to 15 in 2020. Each dot cor­res­ponds to one week, one region, and one sex (men, women).

In the end, did this provide you with a way to com­pare French data to oth­er coun­tries as you had ori­gin­ally hoped for? 

I con­tac­ted col­leagues of mine in Italy, Spain, Eng­land and Ger­many. But no oth­er coun­try could provide me with up to date data like ours. The Eng­lish equi­val­ent to the SPF, Pub­lic Health Eng­land (PHE), struggled to keep up to date so much less data was avail­able. In France, we have encountered prob­lems like this in the past. Dur­ing the sum­mer heat wave in 2003, many eld­erly people lost their lives. At the time, there was little data avail­able about mor­tal­ity in the pub­lic domain and it was funer­al dir­ect­ors who soun­ded the alarm as they were inund­ated with funer­als. As a res­ult, now the data is read­ily available. 

Your work was only pub­lished on your blog, but it still made an impact. Can you tell us more about what happened? 

An inter­est­ing point is that many people could have done it, not just me. The ana­lys­is that I ran is basic enough to have been done by an under­gradu­ate stu­dent. Yet, it high­lights how much of data sci­ence is about hav­ing the right data set to study in the first place. There were oth­ers like me who came to the same con­clu­sion about how to ana­lyse the data – inde­pend­ently of my ana­lyses. Spe­cial­ised journ­al­ists at the New York Times ran a paper about the excess mor­tal­ity rate – a story which was also picked up the The Guard­i­an and Le Monde

For my part, I was invited to a round table at the Académie des Sci­ences along with oth­er math­em­aticians who worked on dif­fer­ent aspects of the pan­dem­ic. Also, anoth­er data sci­ent­ist in the USA, Gaurav Sood, con­tac­ted me. He went fur­ther with the ana­lyses for his own blog, cal­cu­lat­ing the aver­age num­ber of years per death in dif­fer­ent coun­tries. The idea behind this is that many people could say “the ones who die are old and they will have died any­way”. On the con­trary, he showed that those who died of Cov­id-19 lost 9 years on average. 

There are very few coun­tries where the data regard­ing the num­ber of cases are com­pletely reli­able – except per­haps in Ger­many. In cer­tain coun­tries with lack of test­ing like Bolivia, the only way to fol­low the pan­dem­ic is via excess mor­tal­ity. In France, we are some­where in the middle. We have data for the num­ber of cases, but the excess mor­tal­ity rate can give a guide of a truer ver­sion of the fig­ures for com­par­is­on. In the USA the Cov­id-19 case fig­ures are not well meas­ured. This is a polit­ic­ally sens­it­ive issue. For me, mak­ing data like this open and trans­par­ent is part of a healthy demo­crat­ic society.

Do your find­ings on the ana­lys­is of excess mor­tal­ity still hold for the second wave?

Not all the data are avail­able yet: the second wave is still under way and all-cause mor­tal­ity data are made avail­able with a two-week delay. How­ever, on the basis of the data avail­able so far, it seems to me that the phe­nomen­on observed dur­ing the first wave is now less marked. In oth­er words, the dif­fer­ence between Cov­id-19 mor­tal­ity observed in hos­pit­al, on the one hand, and excess mor­tal­ity on the oth­er, is still there but is less sig­ni­fic­ant. At this stage, any explan­a­tion is neces­sar­ily very hypo­thet­ic­al, but it is simply pos­sible that the retire­ment homes are bet­ter able to cope with the Covid-19.

To find out more, you can vis­it Nic­olas Chop­in’s blog here.

Interview by James Bowers

Contributors

Nicolas Chopin

Nicolas Chopin

Professor of data science and machine learning at the ENSAE Centre for Research in Economics and Statistics*

Nicolas Chopin is the author or co-author of one book and more than 60 research papers on topics such as computational statistics, Bayesian inference, and probabilistic machine learning. He is a fellow of the IMS (Institute of Mathmetical Sciences), a member and fomer secretary of the research section of the RSS (Royal Statistical Society), a current or former associate editor for Annals of Statistics, Biometrika, Journal of the Royal Statistical Society, Statistics and Computing, and Statistical Methods & Applications.
*CREST: a joint research unit of CNRS, École Polytechnique - Institut Polytechnique de Paris, ENSAE Paris - Institut Polytechnique de Paris, GENES

Support accurate information rooted in the scientific method.

Donate