Innovators are increasingly harnessing large amounts of data to achieve breakthrough advances in health care. They must also acknowledge and mitigate biases inherent in the data or else risk perpetuating — or even exacerbating — systemic inequities in health care.

To address the issue of bias in data, Health Evolution convened Fellows of the Work Group on Leveraging Data to Improve Health Equity and Governance and the Work Group on Use of Patient Data in Health IT Products in an invitation-only discussion titled Bias In, Bias Out: Addressing Equity Across the Data Pipeline.

“The Work Groups are focused on what we can do to build more trust in the release of information for purposes of developing products that are using some of these newer technologies to have an impact in health care and achieve that dream ‘Goldilocks moment’ of opening up more data while preserving the effectiveness of its use and minimizing bias,” said Aneesh Chopra, President, CareJourney.

Minimizing bias in data collection
Addressing existing biases within large data sets begins with contextualizing where the data is coming from, how the data is collected, who the data represents, and for what purpose the data are used.

Read all of our 2022 Summit articles

For data on demographics and social drivers of health, the best way to minimize bias is to collect data directly from the individual. While it may be tempting to ask doctors to collect more data at the point of care, physicians are already overwhelmed. Even those who understand the downstream benefits are likely to see adding more data collection to their workload as overly burdensome, according to Cris Ross, CIO, Mayo Clinic.

“The only practical way is to collect data in the stream of other activities or from some other source and not by asking the doctor or the nurse to pull out that data,” Ross said.

It’s imperative for innovators to identify sources of bias and where the biases exist in order to correct for it, said Phoebe Yang, General Manager, Healthcare, Amazon Web Services. There are many ways data may either overrepresent or underrepresent a particular subset of individuals, rather than reflect the general population.

“Bias is inherent. The important thing is that you’re not just trying to insert data for the outliers, so to speak, but you’re actually creating the data set that also measures the healthy, because it’s only in the comparison to the healthy that you really understand what’s not healthy,” Yang added. “If you’re only creating data sets around what’s not healthy, your whole analytical framework is skewed.”

Using data to advance health equity
Collecting the appropriate data with a focus on addressing bias will only become more important moving forward as leaders actively target solutions for reducing health disparities.

“There is going to be an increasing expectation that we have more high-quality research happening in clinical settings and really bring together clinical care and research, both of which create important demand signals for high quality data that we’re going to need for health equity,” said Amy Abernethy, MD, President, Clinical Research Business, Verily and a former Principal Deputy Commissioner at the FDA.

Laurie Zephyrin, MD, VP of Advancing Health Equity, The Commonwealth Fund, said the U.S. is currently at an exciting time at the federal policy level as agencies such as CMS and others are taking steps to integrate health equity across their work. Zephyrin added that policies are critical for creating the right conditions, incentives and disincentives because it’s one thing for people to want to work toward advancing equity but having the accountability in place helps organizations make the right decisions.

“It’s important to think beyond the data so that our biases aren’t influencing how we interpret the data,” Zephyrin said. “Then you have to do something about the data. If there are disparities and inequities we have to understand what’s causing them and intervene in some way.”

A new Trust Framework for using de-identified data
A growing number of organizations across the health care system recognize the immense value and potential in establishing data-sharing partnerships that can tackle challenges that any one entity cannot solve on its own. Making more data available is especially important to fuel the development of algorithms and products powered by Artificial Intelligence. But the industry lacks clear guidelines on how best to approach large-scale sharing of de-identified data, which is generally not regulated by most privacy laws.

The Work Group on Governance and Use of Patient Data in Health IT Products developed and is seeking input on a Trust Framework for Accelerating Responsible Use of De-identified Data in Algorithm Development — a necessary first step to collaborate with industry stakeholders in establishing fair and achievable guidelines.

“The goal before the government mandates regulatory action is to determine what we can do to keep ourselves moving in the right direction ethically and responsibly,” Chopra said.

Yang added: “The advantages and benefits of standardizing data around different inputs and workflows would both accelerate us quite a lot and establish a baseline to do more. We have to start somewhere and we should start together.”

Bias in, bias out: Responsible use of data to advance innovation and equity

Tom Sullivan

FELLOWSHIP FEATURE

RECENT POSTS

SEMI-ANNUAL EVENT DATES

2025

2026

2027

INTERESTED IN JOINING FELLOWSHIP?

QUESTIONS?