Covid-19 in the US: We’re not getting full value from our data

The way many states currently collect and present data means that debates on policy fall prey to convenient narratives, rather than evidence based assessments, say Caitlin M Rivers and Natalie E Dean

After a hopeful May in which the incidence of covid-19 steadily declined, the United States is now facing an alarming resurgence. Nationally, reported case counts are increasing week after week. In states like Texas, Florida, and Arizona where numbers are rising quickly, new hospitalizations and the percentage of tests that come back positive are also climbing. Taken together, these metrics indicate an outbreak headed in the wrong direction.

Governors and public health officials have noted that the median age of new cases has declined in many states. More testing has uncovered milder infections across younger ages, but that alone does not explain the rise in the number of positive tests and hospitalizations. What is fueling infections in younger adults? On the news are images of crowded bars and beaches. Bars are known to be high risk settings, but we have no such evidence against beaches. Furthermore, with businesses reopening, many Americans are back at work and thus at greater risk than when sheltering at home.

Without careful epidemiological analyses to explore these different hypotheses, the public discussion reverts to convenient narratives rather than an evidence based assessment of where transmission is occurring. These narratives run the risk of either being wrong or only partially right, preventing us from learning important lessons about how to re-open safely. They also risk polarizing discussions by unduly blaming certain groups, at a time when we need the public on board.

Fortunately, the data to investigate the circumstances driving transmission are routinely collected by every public health department in the form of a “line list.” The Centers for Disease Control and Prevention make available a case investigation form that outlines the data that should be collected on each case of covid-19. Most public health departments use some version of this form. Yet rarely are many analyses using those data made available to the public, beyond the very basics.

Currently, many state health departments report their data in the form of dashboards, but the simple statistics displayed are not enough to address our questions. While users can examine trends in numbers of tests, cases, and deaths over time, rarely if ever are these trends broken down by age group or race/ethnicity. Not all states report zip code level numbers, which are needed to track the inherently local dynamics of outbreaks. Not all states even report on the number of people hospitalized. When available, the data are often hard to extract for analysis, presented in PDFs rather than a machine readable format. Once extracted, comparisons across states are nearly impossible because many variables are not standardized, such as categorizations of age or race/ethnicity.

Also underused are data generated from tracing investigations. Contact tracing is valuable both for its ability to break chains of transmission and the wealth of data it generates on where transmission is occurring. While the raw data are often too complex or sensitive to be shared with the public, cluster investigation reports generated by public health departments are incredibly valuable. These inform broad risk communications, as well as targeted testing and health education strategies. In states facing decisions about re-closing certain activities, these analyses could be used to shape efficient policies to minimize broader impacts on the economy.

For many public health departments, the overwhelming demands of responding to the pandemic have precluded spending time to produce these detailed analyses. And years of underinvestment have left departments short on experts who specialize in data analysis. Nonetheless, there are some steps that should be taken now to improve data insights for the public. Departments should publish more detailed—and time varying —analyses of the data collected through line lists and contact tracing on their dashboards. Accompanying documentation must clearly describe what is being reported and how it should be interpreted.

Public health employees work hard to collect extensive outbreak data, but this information is not being used to its full potential. These data increase in value when they are readable, transparent, and interoperable across states. To supplement their expertise, departments could engage with academic or other partners. Given the crisis, many would be willing to volunteer their time for free.

And going forward, we should take this as a lesson that investing in data collection and management systems and analytical skills is a critical priority for high functioning public health. Public health professionals have long been calling for upgrades to our information technology infrastructure. It’s time those calls are heeded, so that we are better prepared for the next crisis.

Caitlin M Rivers is an assistant professor at the Johns Hopkins Center for Health Security. Twitter @cmyeaton

Competing interests: I have read and understood BMJ policy on declaration of interests and declare the following interests: None

Natalie E Dean is an assistant professor at the University of Florida Department of Biostatistics. Twitter @nataliexdean

Competing interests: I have read and understood BMJ policy on declaration of interests and declare the following interests: I am on the advisory board of the COVID Tracking Project.

Information for Authors