An initiative to create the first large-scale, digital health dataset of Americans that is fully representative across all socio-demographic groups—including ethnic and economic groups usually underrepresented—has been launched as a project of the USC Schaeffer Center for Health Policy & Economics in collaboration with the RAND Corporation and Evidation Health.
Biomedical engineer Ritika Chaturvedi, PhD, who recently joined the USC Schaeffer Center, will serve as principal investigator for the project, called American Life in Real-time (ALiR). ALiR will use digital technologies to create precision public health interventions that focus on reducing health disparities among underrepresented populations by focusing on their unique needs and the various dynamic elements that influence health.
“Current research is limited by a lack of complete and representative data sets. Our goal is to change this and ultimately better understand how different populations have different health behaviors and experience different social determinants of health,” Chaturvedi says. “With that information we hope to create precision public health interventions that meet individual needs, rather than relying on our current one-size-fits-all approach.”
The project, which is being funded by a $1.2 million, four-year grant from the National Institutes of Health, will build a subset of individuals that are nationally representative from an existing survey panel and couple it with Evidation Health’s Achievement Platform, a unique platform designed to help people share digital data from their everyday lives with researchers built on a foundation of user privacy and user control over their health data. Each participant will receive a Fitbit device to collect information on metrics such as physical activity, sleep and heart rate. This data will be coupled with monthly survey data.
“By using a representative sample and including validated digital health technologies, the project offers a critical opportunity to identify disparities in key health outcomes and metrics, particularly in populations that are vulnerable to negative health outcomes but are currently under-represented in this type of research,” said RAND Senior Behavioral and Social Scientist Wendy Troxel. “This is an exciting project coming at a prescient time, as it will be the largest and longest study to date to capture key health metrics in a representative sample of U.S. adults.”
Most existing data sets have issues with biased data which ends up being magnified in research when these data are used for analysis. For example, information collected by internet-enabled devices such as fitness trackers and smart watches is increasingly being used to study public health, but so far such information has been collected primarily from people who purchase the devices on their own. Those people tend to be young, healthy, affluent and female. Because of this field-wide limitation, resulting interventions may systematically underrepresent the most vulnerable.
“Without truly representative populations, person-generated health data is vulnerable to bias, which can perpetuate health inequities. Our unique partnership will harness a novel dataset, alongside Evidation’s connected cohort, Achievement, to develop generalizable data science methodologies that aim to account for all socio-demographic groups, including the historically underrepresented and underserved,” said Luca Foschini, Chief Data Scientist at Evidation.
While the new panel will be nationally representative, it will include an oversampling of participants from underrepresented groups.
“Leaving out a large portion of the population in these studies inevitably leads to the creation of health disparities,” Chaturvedi says. “Creating the American Life in Real-time panel assures that underrepresented groups are fully represented as digital health studies go forward.”
The team aims to build a suite of data science tools that overcome systemic bias in data science and artificial intelligence applications of “big data” in healthcare that is currently pervasive.
This project is supported by the National Library of Medicine of the National Institutes of Health under Award Number R01LM013237. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.