The Potential For Bias In Machine Learning And Opportunities For Health Insurers To Address It – healthaffairs.org
The amount of data collected about health care in the United States is enormous1 and continues to grow rapidly. Machine learning has become embedded in the health insurance industry for tasks such as predicting early disease onset,2 determining the likelihood of future hospitalizations,3 and predicting which members will be medication noncompliant. Algorithms are often developed to optimize interventions to drive improved health outcomes.
As machine learning is increasingly used in health care settings, there is growing concern that it can reflect and perpetuate past and present systemic inequities and biases. Researchers have begun to evaluate algorithms and their effects on disadvantaged or marginalized populations. In one notable study, algorithms used to identify patients for a care management program perpetuated racial disparities,4 further contributing to racial inequities in health care use and disease outcomes.5–8 This research led to immediate calls for greater transparency and accountability across the health care industry in how the use of algorithms is audited and how to avoid bias in predictive models.9
We examine issues of bias and fairness from the health care payer perspective, outlining common sources of and potential solutions to bias in algorithms. These concerns are applicable to any computational tools used by insurers, from linear models to neural networks, but we focus on machine learning methods because of their complexity and opacity. We outline three use cases common among health insurers for identifying and stratifying members who may benefit from care management programs. We then address how entities in the health insurance ecosystem can identify and remediate bias in these cases and beyond. See the online appendix for a summary of the health care data collected by the US health insurance industry, the main stages of machine learning pipelines where bias arises, common sources of bias in predictive health care models, and potential solutions.10
Common Uses Of Predictive Modeling By Insurers
Health insurers use predictive modeling to identify members with complex health needs for interventions and outreach, including care coordination and condition management. To identify and prioritize members for outreach, most health plans rely on some combination of risk scores from commercial vendors, outputs from one or more predictive models, and “if-then” type business rules.
Because these risk-based prioritization strategies drive the allocation of valuable health care resources, the underlying algorithmic processes should undergo regular audits to identify potential biases. We describe how sources of bias related to problem selection, outcome definition, and data availability and reliability manifest across three models commonly used among health insurers to prioritize care management.
Disease Onset
Six in ten US adults have a chronic disease, and four in ten have two or more chronic diseases.11 Chronic diseases are significant causes of death, disability, and reduced quality of life and account for trillions of dollars in annual health care costs. Many chronic diseases may be effectively managed through smoking cessation, nutrition counseling, or medication adherence programs. As a result, models predicting the onset of the most prevalent diseases, especially those tracked by the Centers for Medicare and Medicaid Services (CMS) for quality performance assessments,12 are common among health insurers.
When a predictive model is being developed, a fundamental source of bias is the initial selection of the prediction problem. Models are less common for diseases that tend to affect smaller or minority segments of the member population (such as sickle cell anemia) or that might not have well-defined or easily scalable interventions. Yet targeting such conditions could greatly impact morbidity, mortality, and health care costs for those with the condition.
Another bias common in disease onset models is the availability of data required to identify a target outcome and generate features for predictions. Clinical indicators in claims and in electronic medical record (EMR) data are more likely to be missing or populated at lower frequency for members with less health care use. Moreover, the data reported on the claim reflect disparities in provider treatment and diagnosis stemming from implicit and explicit bias, including racism.5 Further, data related to previous diagnoses and procedures, other medical history, or stage of disease may be missing differentially across groups, adversely affecting predictions. Incorporating data on the social determinants of health, including health care access; poverty; education level; employment; housing; exposure to hazards in living and occupational environments; and access to transportation, food, and health clinics, may improve the performance of disease onset models and reduce the reliance on utilization patterns alone for need-based optimization.
Likelihood Of Hospitalization
According to CMS, hospitalizations represented the largest component of national health care expenditures in 2017 and 2018.13 While many acute inpatient events such as maternity and trauma admissions are unavoidable, others are preventable through effective primary and specialty care, disease management, availability of interventions at outpatient facilities, or all of the above. In 2017 the Agency for Healthcare Research and Quality (AHRQ) estimated that 3.5 million preventable inpatient hospitalizations accounted for $33.7 billion in hospital costs.14
Machine learning models that predict the likelihood of an avoidable inpatient hospitalization (known as likelihood of hospitalization models) can help target interventions, prevent adverse health outcomes, and reduce individual and population health care costs.15–18 However, observing an acute hospitalization event in the data is contingent on access to and use of health care services, both of which are influenced by racial and socioeconomic disparities.11,19 Disparities in access and use mean that some subpopulations are underrepresented in the target population and in the data used to predict the outcome of interest. Thus, the resulting model output may reflect those systemic biases, and interventions or policy decisions based on the model outputs risk reinforcing and exacerbating existing inequities.
Similar to disease onset models, one way to address the data disparities in likelihood of hospitalization models is through inclusion of additional data sources that show patterns in primary or preventative care that can prevent unplanned hospitalization. EMR data can add granularity to clinical events, capturing diagnostic and other health information that may not be recorded on claims. However, integrating EMR and claims data can introduce additional bias20 stemming from missing or incomplete records for patients who experience barriers to consistent care. Importantly, missing clinical codes can indicate lack of key diagnostics, procedures, or primary care support along a patient’s health care journey that might have precluded the need for inpatient hospitalization. Similar symptoms may be treated differently among providers, leading to downstream effects on hospitalization. Data on social determinants of health can also improve the performance and potentially interpretations of likelihood of hospitalization prediction tasks.
Medication Adherence
In 2003 the World Health Organization noted that approximately 50 percent of patients with chronic illnesses do not take medications as prescribed.21 In the United States, lack of medication adherence can lead to morbidity and mortality and is estimated to cost $100 billion per year.22 CMS also considers medication adherence to be a critical component of Medicare health plan performance ratings, making predictive models for medication adherence common across the health insurance industry. Adherence is also associated with reduced health services use and lower medical costs for many chronic conditions.23
Predictive models often help health insurers’ pharmacy departments design member outreach strategies to improve adherence. These models can be developed using regression or classification approaches. Regression-based approaches typically predict the proportion of days covered, defined as the proportion of days during a calendar year that a member has access to their medications, and classification approaches use a proportion of days covered of greater than 80 percent as a target threshold.
Medication adherence can be influenced by many factors, including dosing frequency, side effects, and routes of administration. However, differences in diagnosis, treatment, and prescribing are also well documented. Compared with White patients, members of racial and ethnic minority groups are less likely to be prescribed opioids for chronic pain and less likely to receive evidence-based prescribing practices related to antidepressants, anticoagulants, diabetes medications, drugs for dementia, and statins.24–32 When medication adherence models are being designed, a different target definition of whether a member should have a prescription for a condition based on clinical care guidelines may be more appropriate.
Using machine learning to identify patients at risk for being noncompliant with a new medication regimen or for falling below an optimal level of adherence over time can be valuable for targeting resources and programs. However, health plans and other entities that develop and use medication adherence models (such as pharmacy benefit managers and health systems) must recognize how systemic biases in access to pharmacies and prescription drugs, prescribing patterns, and utilization in Black and Brown communities affect problem formulation, algorithm development and interpretation, and intervention strategies.33–36
Understanding why a member was predicted to be noncompliant is particularly relevant when medication adherence interventions are being selected and implemented. Collaborations between interventionists and data scientists can ensure that relevant contextual information is used to refine the predictive model at hand. For example, instead of predicting medication adherence directly, data scientists can identify members most receptive to lower-cost medication alternatives or nontraditional delivery methods, as these are likely to be patients struggling with financial or transportation barriers.
Auditing Machine Learning Pipelines For Bias
Fortunately, there are several ways to check predictive models and business processes for bias, and health insurers should establish standard but flexible protocols for auditing their models and processes. Here we outline several practical approaches, and we note that there is likely no “one-size-fits-all” solution.
Representational Fairness
One way to check for bias is to examine rates of outreach and engagement in care management programs relative to the proportions of subgroups in the data. For example, an eligible population may be observed that is 40 percent White, 30 percent Black or African American, 20 percent Hispanic or Latino, and 10 percent Asian. If the proportions of those targeted for outreach and engaged in care management do not reflect the underlying population distribution, one might conclude that there was an element of representational bias.37 Note, however, that this method does not report whether resources were appropriately allocated. That is, there may be reasons to distribute resources equitably based on true care needs, with higher rates of engagement from some subpopulations than others, rather than equally.
Counterfactual Reasoning
Counterfactual reasoning asks the question, If a given person was from a different subpopulation but with the same health profile, would they have received the same predicted probability of an outcome? For care management, the analogous question could be comparing care management program membership for Black and White patients. Researchers found that when patients were prioritized by risk scores—representing patient medical costs—from a predictive algorithm, only 17 percent of the patients eligible for a care management program were Black.4 To simulate a correction, researchers swapped sicker Black patients for less sick White patients at each level of risk until no more swaps were possible, with sickness measured by total number of chronic conditions. In this synthetic correction, 46 percent of the patients qualifying for the care management program were Black. By assessing counterfactual fairness,38 it is possible to examine how a model treats both race and other potentially unmeasured confounding factors that may be correlated to race.
Error Rate Balance And Error Analysis
Error rate balance involves comparing false positive and false negative rates for predictions within specified subpopulations.39 Analyses might compare the rates of false positives and false negatives by race, ethnicity, or gender. For example, a chi-square test can be used to compare the rates of false positives (and false negatives) by gender. A statistically significant result would indicate that the model does not predict equally well for both groups and therefore has some degree of bias vis-à-vis the error rate balance criterion.
Error rate balance reports patterns that the model is detecting and missing. It increases understanding of why the model is making classification errors by examining members and groups who are most likely to receive an incorrect prediction. For example, a model predicting chronic disease occurrence may be less accurate for members with specific conditions, for members of certain races or ethnicities or who live in certain geographies or see certain providers. Researchers can then investigate where the machine learning pipeline can be improved and, in the context of a chronic disease occurrence prediction task, may decide to optimize to reduce false negative rates over false positive rates. Potential strategies are to adjust upsampling or downsampling rates in the training data or generate different models for different subpopulations. In addition to data-based solutions, reviewing errors with a diverse set of stakeholders who can provide context from lived experience about why specific types of errors are observed and what impact they have can reduce unintentional harm that could be caused when different types of errors are made.
When bias is identified, it is important for stakeholders to have transparent discussions about whether and how the biases are problematic, and the potential gaps in data or other aspects of model development that could have led to the bias. Stakeholders should strategize about different modeling approaches that could reduce bias, including redefining the target outcome; experimenting with sampling methods, data augmentation, or restriction; and model class selection. In some instances, solutions may lead to models that have poorer fit but that may be fairer, in which case stakeholders need to adhere to ethical principles in balancing model performance, business needs, and health equity.
Addressing Bias In Machine Learning As An Industry
Health insurers share several challenges in assessing and reducing bias that could be addressed collaboratively.
Health insurers share several challenges in assessing and reducing bias that could be addressed collaboratively as an industry. While these themes are not exhaustive, we believe that they represent primary areas where the field of fair machine learning has the potential to make major advances in the coming months and years.
Industry Vigilance
Algorithmovigilance refers to scientific methods and activities relating to the evaluation, monitoring, understanding, and prevention of adverse effects of algorithms in health care.40 Calls for the health care industry, including health insurers, to monitor and evaluate machine learning models for bias have been increasing from several sectors. In January 2021 Pennsylvania’s new Interagency Health Reform Council recommended that payers and providers review and revise their predictive analytics and algorithms to remove bias.41 The National Committee for Quality Assurance (NCQA) and AHRQ also have taken an interest in the impact of health care algorithms on racial disparities in health and health care. For example, the NCQA is incorporating evaluation of racial bias into accreditation standards.42 In addition, legislation introduced in the House and Senate in 2019—the Algorithmic Accountability Act—would have required certain commercial entities to conduct assessments of high-risk systems that involve personal information or make automated decisions, such as machine learning. This attention to bias in health care algorithms has led to the development of and renewed attention to guidelines, best practices, and analytics tools related to the evaluation and use of algorithms in predictive analytics.43 These tools have the potential to inform and unify the entire payer space to combat bias and enable health insurers to more effectively provide high-quality, equitable care and services to members. Ultimately, these tools will require testing at scale and constant and rigorous evaluation to ensure that they are having the intended positive impacts on member populations and that models tuned for fairness do not undergo “bias drift” over time or during business implementation.
Algorithmovigilance requires that machine learning models be designed in ways that can be empirically examined. Health care companies should incorporate known methods for identifying and remediating algorithmic bias into their machine learning pipelines and participate in the ongoing development and dissemination of new methods. Regular assessment of whether models are generating insight and result in actions that maximize the intended outcome, such as reducing acute hospitalizations in a population, should take place. Evaluations should not be limited to the model output but should also assess the impact of actions taken based on model results and should examine whether impacts were differential across relevant subgroups.
Models that are both accurate and fair will lead to interventions and business practices that ultimately benefit members at the highest levels of risk and need and lead to better outcomes and lower costs.
Obtaining And Ethically Using Race And Ethnicity Data
Data on members’ race and ethnicity could enhance medical management programs and facilitate audits for possible racial bias in both algorithmic output and care management outreach. Yet most health plans do not collect race, ethnicity, or primary or preferred language data as part of the enrollment process or in any other systematic way.
CMS has recently made race, ethnicity, and language data available to health plans for Medicare Advantage enrollees. For commercially insured members, individual-level data may be available in EMR data from provider health systems, although not all health systems provide EMR data to payers. Health plans may also obtain these data from surveys, although surveys are usually administered to subsets of the member population. Third-party vendor data also contain information on race, ethnicity and language, but match rates with health plan membership varies, as does the specificity of the data. Race imputation using statistical estimation techniques such as Bayesian Improved Surname Geocoding or Bayesian Improved First Name Surname Geocoding44,45 may also be embedded with bias. Data on race, ethnicity, and language can also be obtained at the census block or tract level through the American Community Survey, but these data sources don’t provide individual-level specificity and are limited to five single-race groups, which does not sufficiently capture heterogeneity within a community.
Many health plans are hesitant to collect and use data on race, ethnicity, and language, even when provided voluntarily, because of the lack of established regulatory and oversight policies on how to ethically collect, aggregate, use, and report data on race and ethnicity. Establishing these policies at the federal or state level would provide guidance and protections, but this will likely take years to develop and implement. The health insurance industry should coalesce around ethical principles and standards for collecting and using data on race, ethnicity, and language, as well as on other social determinants of health. Entities such as America’s Health Insurance Plans or the NCQA could also establish standard practice protocols, which may include establishing a review board or oversight committee at each health plan that would govern the use of race and ethnicity data in analytics and reporting.
Addressing Missing Data And Bad Proxies
Member health data are not collected unless a provider is seen, resulting in more missing data on populations that have obstacles to access care. Even when care is delivered, disparities in treatment and diagnosis contribute to incomplete and even incorrect data.5,6 Sometimes, proxies for a particular target variable or for individual features are used, but they also can be flawed and exacerbate bias.4 For example, member race used as a feature in a model for condition onset should not be used to make claims about underlying genetic differences. Race is a proxy for systemic racism and should be considered interactively with other data including social determinants of health. As another example, health care costs are not an optimal or complete representation of condition complexity.
To facilitate fair machine learning, better methodologies for evaluating and addressing data missingness, sparsity, and irregularities are needed. For example, computers can generate realistic health care data to rebalance data sets, but the synthetic data may in fact perpetuate existing biases.46 Health-related behaviors for high-risk members who underuse care are driven by a multitude of social determinants of health and other environmental factors not captured in data commonly available to health plans. The next generation of machine learning and artificial intelligence in the health insurance industry needs to explicitly consider how to incorporate outside sources of data from social media platforms, wearable devices, crowdsourcing, and other types of small- and large-scale community-level resources. Cross-plan collaborations could also lead to robust insights—for example, across members insured through Medicare, Medicaid, and commercial plans across the US.
Including All Relevant Voices
Machine learning in health care is developed in response to a business or clinical question. Fairness in machine learning is facilitated by collaborative conversations between machine learning scientists and clinical experts, supplemented by member voices, and guided by the expertise of equity experts. Diverse data science teams, including practitioners with lived experience—especially those who are disproportionately affected by systemic inequities in the health care system—must be intentionally created. Collaboration within and across such teams can reveal blind spots and impediments47 in efforts to promote health equity through predictive analytics.
Conclusion
Opportunities exist to ensure that machine learning is fair, not only on ethical grounds but also on strong operational and business grounds.
The responsibility for building and implementing equitable machine learning models lies with the broader health insurance community. Continued machine learning development is inevitable. Opportunities exist to ensure that machine learning is fair, not only on ethical grounds but also on strong operational and business grounds. With recent calls for active vigilance of machine learning and its implementations, institutional and industry commitments to increase equity in health care are needed. This includes developing and disseminating best practices in bias detection and remediation as well as the development of targeted programs to reduce bias and promote equity, and deeper involvement and communication with the members and communities served by health plans. With these combined efforts, more equitable health care can be achieved.
ACKNOWLEDGMENTS
Stephanie S. Gervasi and Irene Y. Chen are co–first authors of this work. The authors are grateful to Alya Nadji and two anonymous reviewers for feedback that greatly improved their manuscript. This is an open access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt, and build upon this work, for commercial use, provided the original work is properly cited. See https://creativecommons.org/licenses/by/4.0/.
NOTES
1 . Big data analytics in healthcare: promise and potential. Health Inf Sci Syst. 2014;2:3. Crossref, Medline, Google Scholar 2 . Population-level prediction of type 2 diabetes from claims data and analysis of risk factors. Big Data. 2015;3(4):277–87. Crossref, Medline, Google Scholar 3 Risk prediction models for hospital readmission: a systematic review. JAMA. 2011;306(15):1688–98. Crossref, Medline, Google Scholar 4 . Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447–53. Crossref, Medline, Google Scholar 5 . Inequality in quality: addressing socioeconomic, racial, and ethnic disparities in health care. JAMA. 2000;283(19):2579–84. Crossref, Medline, Google Scholar 6 . Inequity in crisis standards of care. N Engl J Med. 2020;383(4):e16. Crossref, Medline, Google Scholar 7 Equitably allocating resources during crises: racial differences in mortality prediction models. Am J Respir Crit Care Med. 2021;204(2):178–86. Medline, Google Scholar 8 . The crisis in crisis standards of care. Ann Am Thorac Soc. 2021 Feb 5(ja). Google Scholar 9 Ron Wyden [Internet]. Washington (DC): Office of Sen. Ron Wyden. Press release, Wyden, Booker demand answers on biased health care algorithms; 2019 Dec 3 [cited 2022 Jan 11]. Available from: https://www.wyden.senate.gov/news/press-releases/wyden-booker-demand-answers-on-biased-health-care-algorithms Google Scholar 10 To access the appendix, click on the Details tab of the article online. 11 . Racial and ethnic differences in 30-day hospital readmissions among US adults with diabetes. JAMA Netw Open. 2019;2(10):e1913249. Crossref, Medline, Google Scholar 12 Centers for Medicare and Medicaid Services. Healthcare Effectiveness Data and Information Set (HEDIS) [Internet]. Baltimore (MD): CMS; [last modified 2021 Dec 1; cited 2022 Jan 12]. Available from: https://www.cms.gov/Medicare/Health-Plans/SpecialNeedsPlans/SNP-HEDIS Google Scholar 13 Centers for Medicare and Medicaid Services. NHE fact sheet [Internet]. Baltimore (MD): CMS; [last modified 2021 Dec 15; cited 2022 Jan 3]. Available from: https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/NationalHealthExpendData/NHE-Fact-Sheet Google Scholar 14 . Characteristics and costs of potentially preventable inpatient stays, 2017 [Internet]. Rockville (MD): Agency for Healthcare Research and Quality; 2020 Jun [cited 2021 Dec 23]. (HCUP Statistical Brief No. 259). Available from: www.hcup-us.ahrq.gov/reports/statbriefs/sb259-Potentially-Preventable-Hospitalizations-2017.pdf Google Scholar 15 Risk prediction models for hospital readmission: a systematic review. JAMA. 2011;306(15):1688–98. Crossref, Medline, Google Scholar 16 Evaluating the performance of a predictive modeling approach to identifying members at high-risk of hospitalization. J Med Econ. 2020;23(3):228–34. Crossref, Medline, Google Scholar 17 . Do payor‐based outreach programs reduce medical cost and utilization? Health Econ. 2020;29(6):671–82. Crossref, Medline, Google Scholar 18 . The effect of predictive analytics–driven interventions on healthcare utilization. J Health Econ. 2019;64:68–79. Crossref, Medline, Google Scholar 19 . Race, ethnicity, and hospitalization for six chronic ambulatory care sensitive conditions in the USA. Ethn Health. 2006;11(3):247–63. Crossref, Medline, Google Scholar 20 . Potential biases in machine learning algorithms using electronic health record data. JAMA Intern Med. 2018;178(11):1544–7. Crossref, Medline, Google Scholar 21 . Adherence to long-term therapies: evidence for action [Internet]. Geneva: World Health Organization; 2003 [cited 2022 Jan 3]. Available from: https://apps.who.int/iris/bitstream/handle/10665/42682/9241545992.pdf Google Scholar 22 . Medication adherence: helping patients take their medicines as directed. Public Health Rep. 2012;127(1):2–3. Crossref, Medline, Google Scholar 23 . Adherence to medication. N Engl J Med. 2005;353(5):487–97. Crossref, Medline, Google Scholar 24 . Racial disparities across provider specialties in opioid prescriptions dispensed to Medicaid beneficiaries with chronic noncancer pain. Pain Med. 2015;16(4):633–40. Crossref, Medline, Google Scholar 25 . Racial-ethnic disparities in opioid prescriptions at emergency department visits for conditions commonly associated with prescription drug abuse. PloS One. 2016 ;11(8):e0159224. Crossref, Medline, Google Scholar 26 . Racial-ethnic disparities in use of antidepressants in private coverage: implications for the Affordable Care Act. Psychiatr Serv. 2014;65(9):1140–6. Crossref, Medline, Google Scholar 27 Racial-ethnic disparities in stroke care: the American experience: a statement for healthcare professionals from the American Heart Association/American Stroke Association. Stroke. 2011;42(7):2091–116. Crossref, Medline, Google Scholar 28 Racial differences in long-term adherence to oral antidiabetic drug therapy: a longitudinal cohort study. BMC Health Serv Res. 2009;9:24. Crossref, Medline, Google Scholar 29 . Ethnic differences in acetylcholinesterase inhibitor use for Alzheimer disease. Neurology. 2005;65(1):159–62. Google Scholar 30 . Factors associated with county-level variation in the prescription of statins. J Manag Care Spec Pharm. 2019;25(12):1358–65. Medline, Google Scholar 31 . Association between pharmacy closures and adherence to cardiovascular medications among older US adults. JAMA Netw Open. 2019;2(4):e192606. Crossref, Medline, Google Scholar 32 . Social determinants of pharmacy deserts in Los Angeles County. J Racial Ethn Health Disparities. 2020;8(6):1424–34. Crossref, Medline, Google Scholar 33 . Asthma disparities in urban environments. J Allergy Clin Immunol. 2009;123(6):1199–206. Crossref, Medline, Google Scholar 34 Implicit racial/ethnic bias among health care professionals and its influence on health care outcomes: a systematic review. Am J Public Health. 2015;105(12):e60–76. Crossref, Medline, Google Scholar 35 . Cultural issues in medication adherence: disparities and directions. J Gen Intern Med. 2018;33(2):200–6. Crossref, Medline, Google Scholar 36 . Fairness-aware learning through regularization approach. In: Spiliopoulou M, Wang H, Cook D, Pei J, Wang W, Zaïane O editors. ICDMW 2011: 11th IEEE International Conference on Data Mining Workshops; 2011 Dec; Vancouver, British Columbia [Internet]. Piscataway (NJ): Institute of Electrical and Electronics Engineers, Inc.; 2011 [cited 2021 Dec 23]. p. 643–50. Available from: https://ieeexplore.ieee.org/xpl/conhome/6136510/proceeding Google Scholar 37 . Fairness constraints: mechanisms for fair classification. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics 2017; 2011 Apr 20–22; Fort Lauderdale, Florida. Proceedings of Machine Learning Research [serial on the Internet]. Vol. 54 (2017); [cited 2021 Dec 23]. Available from: http://proceedings.mlr.press/v54/zafar17a/zafar17a.pdf Google Scholar 38 . Counterfactual fairness. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S editors. Advances in Neural Information Processing Systems 30 (NIPS 2017); 2017 Dec 4–9; Long Beach, California [Internet]. [place unknown]: Neural Information Processing Systems Foundation; [cited 2021 Dec 23]. Available from: https://proceedings.neurips.cc/paper/2017/file/a486cd07e4ac3d270571622f4f316ec5-Paper.pdf Google Scholar 39 . Equality of opportunity in supervised learning. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R editors. Advances in Neural Information Processing Systems 29 (NIPS 2016); 2016 Dec 5–10; Barcelona, Spain [Internet]. [place unknown]: Neural Information Processing Systems Foundation; [cited 2021 Dec 23]. Available from: https://proceedings.neurips.cc/paper/2016/file/9d2682367c3935defcb1f9e247a97c0d-Paper.pdf Google Scholar 40 . Algorithmovigilance—advancing methods to analyze and monitor artificial intelligence–driven health care for effectiveness and equity. JAMA Netw Open. 2021;4(4):e214622. Crossref, Medline, Google Scholar 41 Interagency Health Reform Council. Health care reform recommendations [Internet]. Harrisburg (PA): Office of Governor Tom Wolf; 2020 Dec [cited 2021 Dec 15]. Available from: https://www.governor.pa.gov/wp-content/uploads/2021/01/IHRC-HCR-Recommendations.pdf Google Scholar 42 National Committee for Quality Assurance. Health equity [Internet]. Washington (DC): NCQA; [cited 2022 Jan 12]. Available from: https://www.ncqa.org/about-ncqa/health-equity/ Google Scholar 43 . Algorithm bias playbook [Internet]. Chicago (IL): Center for Applied Artificial Intelligence; [cited 2021 Dec 23]. Available for download from: https://www.chicagobooth.edu/research/center-for-applied-artificial-intelligence/research/algorithmic-bias Google Scholar 44 . Using first name information to improve race and ethnicity classification. Statistics and Public Policy. 2018;5(1):1–3. Crossref, Google Scholar 45 Imputation of race/ethnicity to enable measurement of HEDIS performance by race/ethnicity. Health Serv Res. 2019;54(1):13–23. Crossref, Medline, Google Scholar 46 . Hurtful words: quantifying biases in clinical contextual word embeddings. In: CHIL ’20: Proceedings of the ACM Conference on Health, Inference, and Learning; 2020 Apr 2–4; Toronto, Ontario [Internet]. New York (NY): Association for Computing Machinery; [cited 2021 Dec 23]. Available from: https://dl.acm.org/doi/10.1145/3368555.3384448 Google Scholar 47 Turning the crank for machine learning: ease, at what expense? Lancet Digit Health. 2019;1(5):e198–9. Crossref, Medline, Google Scholar