Where Is Big Data When We Need It Most?

Editor’s Note: This op-ed was originally published on The Hill on April 23, 2020.

“Big Data”, the immense trove of information about our lives being generated at staggering rates is missing in action against COVID-19. That needs to change if we want to minimize the blunt and economically damaging remedies of stay-at-home orders and non-essential business closures.

Most specifically, electronic health records — which doctors and hospitals have been collecting for a decade — could powerfully and precisely guide the pandemic response, reducing transmission of the disease and improving care for the infected. The key question is whether it could be aggregated anonymously and carefully guarded against use for anything other than population-level public health purposes.

Health care data such as results of routine laboratory tests or x-rays would help find patterns of infection when tests are unavailable. The data could characterize patients at-risk for severe disease, give prognoses for infected patients, or estimate the effectiveness of unproven treatments before randomized controlled trials can be performed. These records live now in systems inaccessible to researchers.

Electronic health records could also be enlisted to measure the stresses on health care workers. The data could show, for example, if a single respiratory therapist has been involved in the care of too many patients in one day. Hospitals urgently need to identify overstretched workers who may be placing their patients or themselves at higher risk. National sharing of workforce data could lead to shifting health care workers where they are needed most — such as sending more doctors to one hospital or more nurses to another — even across state lines.

Sign up for Schaeffer Center news

Health care data also exists that would allow us to know whether non-COVID patients are delaying care and suffering consequences from those delays. That is essential knowledge if we are to keep the death toll from the pandemic from spilling over into other diseases.

Combining health data with the vast amount of social data collected by governments and companies could give local, state, and federal public health agencies powerful new tools for tracking and controlling disease outbreak.

Health care data linked to occupational data in IRS records would open the door to understanding which occupations are at greatest risk for developing and spreading disease. Testing strategies and infrastructure efforts (like masks) may naturally target these occupations.

Mobility data — say, the distance smartphone owners travel in a day or the number of unique areas visited — could be combined with health care data and correlated with confirmed infections to estimate, in real-time, whether current levels of social distancing are effective, whether stricter public health measures are necessary, or whether areas, where health disparities exist, need additional social services to prevent infectious spread. It could also be used to demonstrate to the frightened and frustrated public that their sacrifices are making a difference.

Although bringing this data together sounds like a daunting task, it really isn’t. Any financial investment pales compared to the economic costs of this pandemic, which will run into the trillions.

To prevent abuse, aggregated data should be placed in the hands of those who have professional and moral obligations to both public health and privacy. Congress already has advisory commissions, like the Medicare Payment Advisory Commission, to help it with Medicare policy questions. Congress could establish a Pandemic Advisory Response Commission made up of data scientists, epidemiologists, economists, and other researchers qualified to conduct research under strict ethical standards and to handle sensitive data in accordance with the existing privacy rules.

Such a commission might find a way to augment anonymous aggregated records with newly collected non-anonymous data. Tracing an infected patient’s contacts with mobile phone apps that log nearby Bluetooth signals has proven quite powerful in preventing the spread of COVID-19 in South Korea and Singapore. If assured that their data would be handled carefully and not be used for any purpose other than public health, many Americans may agree to volunteer personal data if it serves the greater good of reducing illness and ending the pandemic sooner.

The irony of all this — as anyone bombarded by targeted ads readily understands — is that sophisticated marketers already use our information to sell us their products. Let’s harness Big Data’s power to protect public health and grow the economy.

Christopher M. Worsham is a pulmonologist and critical care physician at Massachusetts General Hospital and Harvard Medical School. Follow him on Twitter: @ChrisWorsham. Anupam B. Jena is Ruth L. Newhouse associate professor of health care policy at Harvard Medical School, an internist at Massachusetts General Hospital, and a faculty research fellow at the National Bureau of Economic Research. Follow her on Twitter: @AnupamBJena on Twitter. Dana Goldman is director of the USC Schaeffer Center for Health Policy & Economics.