Home / Columns / Covid-19 death rates are higher than reported
Nicolas Chopin
π Health and biotech π Society

Covid-19 death rates are higher than reported

Nicolas Chopin, Professor of data science and machine learning at the ENSAE Centre for Research in Economics and Statistics*

Since the begin­ning of the pan­dem­ic the world has turned its atten­tion to Covid-19 data. The most often cit­ed are the counts for new cas­es or num­ber of deaths. Yet, nei­ther of these fig­ures are com­plete­ly reli­able. Dur­ing the lock­down in March 2020, Pro­fes­sor of data sci­ence and sta­tis­tics at the ENSAE, Nico­las Chopin took it upon him­self to look deep­er into the fig­ures pro­vid­ed by French author­i­ties. As oth­ers have done around the world, he found that true Covid-19 mor­tal­i­ty rates were like­ly to be high­er than those reported. 

Dur­ing the lock­down, you took a look at the pan­dem­ic data dif­fer­ent­ly. What did you find? 

At the begin­ning of the pan­dem­ic every­one was talk­ing about Covid-19 and I want­ed to be help­ful in some way. I’m a data sci­en­tist so I sought out Open Data. Orig­i­nal­ly, I want­ed to com­pare the evo­lu­tion of the pan­dem­ic between coun­tries. But very quick­ly I was con­front­ed with the real­i­ty that this isn’t pos­si­ble. Test­ing strate­gies var­ied enor­mous­ly between nations – espe­cial­ly at the begin­ning of the pan­dem­ic – so it was obvi­ous that the num­ber of new cas­es was an unre­li­able comparison. 

Instead, I looked at the num­ber of deaths. Access to the data was actu­al­ly very easy in France because the two main sta­tis­tics insti­tutes pro­vide pre­cise data on a dai­ly basis – San­té Publique France (SPF) and Insti­tut Nation­al de la Sta­tis­tique et des Etudes Economiques (INSEE). As the pan­dem­ic pro­gressed, they also pro­gres­sive­ly improved the data made available. 

Nev­er­the­less, these fig­ures didn’t quite match up either. What was miss­ing from the data? 

First, the SPF pro­vides exten­sive amounts of data com­ing from hos­pi­tals. How­ev­er, this is lim­it­ed to hos­pi­tal-relat­ed infor­ma­tion and so only offers data with regards to patients treat­ed at hos­pi­tals. Sec­ond, the INSEE releas­es exhaus­tive data on deaths that have occurred since March 2020, but there is no infor­ma­tion in these data on the cause of death. So alone, the data is dif­fi­cult to inter­pret. Togeth­er, how­ev­er, they pro­vide valu­able insight into the pro­gres­sion of the pan­dem­ic in France. 

I start­ed by com­par­ing the num­ber of deaths pro­vid­ed by the INSEE in pre­vi­ous years to those of this year. Of course, there was an increase – this is the excess mor­tal­i­ty rate, which shows the num­ber of extra deaths com­pared to the num­ber of expect­ed deaths. And it was 60% high­er on aver­age than the record­ed Covid-19 deaths pro­vid­ed by the SPF. It is like­ly that these deaths are most­ly due to Covid-19 deaths in elder­ly retire­ment homes. But they could be due to relat­ed caus­es, such as lack of treat­ment for oth­er health prob­lems due to the crisis.

Lat­er, I analysed the same data by sep­a­rat­ing men and women. Here, I found that the fac­tor for men alone was 1.56 and women was 2.4. This makes sense as there are three times more women than men in retire­ment homes on aver­age across the coun­try. These insti­tu­tions are much less equipped to col­lect and share data, hence the dis­crep­an­cy in the numbers. 

Excess mor­tal­i­ty ver­sus covid deaths in hos­pi­tal dur­ing weeks 13 to 15 in 2020. Each dot cor­re­sponds to one week, one region, and one sex (men, women).

In the end, did this pro­vide you with a way to com­pare French data to oth­er coun­tries as you had orig­i­nal­ly hoped for? 

I con­tact­ed col­leagues of mine in Italy, Spain, Eng­land and Ger­many. But no oth­er coun­try could pro­vide me with up to date data like ours. The Eng­lish equiv­a­lent to the SPF, Pub­lic Health Eng­land (PHE), strug­gled to keep up to date so much less data was avail­able. In France, we have encoun­tered prob­lems like this in the past. Dur­ing the sum­mer heat wave in 2003, many elder­ly peo­ple lost their lives. At the time, there was lit­tle data avail­able about mor­tal­i­ty in the pub­lic domain and it was funer­al direc­tors who sound­ed the alarm as they were inun­dat­ed with funer­als. As a result, now the data is read­i­ly available. 

Your work was only pub­lished on your blog, but it still made an impact. Can you tell us more about what happened? 

An inter­est­ing point is that many peo­ple could have done it, not just me. The analy­sis that I ran is basic enough to have been done by an under­grad­u­ate stu­dent. Yet, it high­lights how much of data sci­ence is about hav­ing the right data set to study in the first place. There were oth­ers like me who came to the same con­clu­sion about how to analyse the data – inde­pen­dent­ly of my analy­ses. Spe­cialised jour­nal­ists at the New York Times ran a paper about the excess mor­tal­i­ty rate – a sto­ry which was also picked up the The Guardian and Le Monde

For my part, I was invit­ed to a round table at the Académie des Sci­ences along with oth­er math­e­mati­cians who worked on dif­fer­ent aspects of the pan­dem­ic. Also, anoth­er data sci­en­tist in the USA, Gau­rav Sood, con­tact­ed me. He went fur­ther with the analy­ses for his own blog, cal­cu­lat­ing the aver­age num­ber of years per death in dif­fer­ent coun­tries. The idea behind this is that many peo­ple could say “the ones who die are old and they will have died any­way”. On the con­trary, he showed that those who died of Covid-19 lost 9 years on average. 

There are very few coun­tries where the data regard­ing the num­ber of cas­es are com­plete­ly reli­able – except per­haps in Ger­many. In cer­tain coun­tries with lack of test­ing like Bolivia, the only way to fol­low the pan­dem­ic is via excess mor­tal­i­ty. In France, we are some­where in the mid­dle. We have data for the num­ber of cas­es, but the excess mor­tal­i­ty rate can give a guide of a truer ver­sion of the fig­ures for com­par­i­son. In the USA the Covid-19 case fig­ures are not well mea­sured. This is a polit­i­cal­ly sen­si­tive issue. For me, mak­ing data like this open and trans­par­ent is part of a healthy demo­c­ra­t­ic society.

Do your find­ings on the analy­sis of excess mor­tal­i­ty still hold for the sec­ond wave?

Not all the data are avail­able yet: the sec­ond wave is still under way and all-cause mor­tal­i­ty data are made avail­able with a two-week delay. How­ev­er, on the basis of the data avail­able so far, it seems to me that the phe­nom­e­non observed dur­ing the first wave is now less marked. In oth­er words, the dif­fer­ence between Covid-19 mor­tal­i­ty observed in hos­pi­tal, on the one hand, and excess mor­tal­i­ty on the oth­er, is still there but is less sig­nif­i­cant. At this stage, any expla­na­tion is nec­es­sar­i­ly very hypo­thet­i­cal, but it is sim­ply pos­si­ble that the retire­ment homes are bet­ter able to cope with the Covid-19.

To find out more, you can vis­it Nico­las Chopin’s blog here.

Inter­view by James Bowers

Contributors

Nicolas Chopin
Nicolas Chopin
Professor of data science and machine learning at the ENSAE Centre for Research in Economics and Statistics*

Nicolas Chopin is the author or co-author of one book and more than 60 research papers on topics such as computational statistics, Bayesian inference, and probabilistic machine learning. He is a fellow of the IMS (Institute of Mathmetical Sciences), a member and fomer secretary of the research section of the RSS (Royal Statistical Society), a current or former associate editor for Annals of Statistics, Biometrika, Journal of the Royal Statistical Society, Statistics and Computing, and Statistical Methods & Applications.
*CREST: a joint research unit of CNRS, École Polytechnique - Institut Polytechnique de Paris, ENSAE Paris - Institut Polytechnique de Paris, GENES