Critical Thinking Still a Key Element Missing from Algorithmic Intelligence
Clinical algorithms and risk management algorithms have been found to have substantial biases that directly impact access to care for minorities. The sources of bias can come from proxies utilized in models to the source of data, showing that deeper analyses and approaches to eliminating bias are needed.
The pandemic and Black Lives Matter movement have catalyzed political pressure on the health system to serve communities better. Policy makers, community organizations, and the press have all made this topic a key issue affecting the digitization and modernization of healthcare services today.
A number of technology tools have been developed to find bias in AI/ML systems and this is encouraging but human oversight is still required. The complexity of the issue of race in medicine is such that a purely technological approach will only go so far.
Mid-April I wrote about some of the challenges brought to light in the first months of the pandemic when we began noticing the disproportionate disease burden that COVID-19 meant for Black and LatinX communities. As the months have passed and we have witnessed the rise of the Black Lives Matter protests, medicine has emerged as one of the many industries exhibiting clear systemic racial bias. Given the disease burden and likely long-term syndromic issues that COVID-19 is causing, in addition to a disproportionate share of ‘essential workers’ being minorities, this issue has gained significant traction in the broader social discourse.
In the early days of the summer at the first signs the political winds had changed, we saw a number of major enterprises ban the sale of their facial recognition technology to police departments after years of criticism of the bias inherent to these tools. In healthcare we were apprised last fall of a major bias in an algorithm developed for Optum’s risk management of patients that had utilized cost data as a label or proxy for health status that incorrectly resulted in putting white patients in the front of the line for disease management programs despite better overall health status.
The researchers who uncovered the bias worked with Optum to address the problem, but we have seen New York state regulators call for proof that the algorithm no longer discriminates against Black patients. Senators Cory Booker and Ron Wyden also wrote letters to CMS, FTC and insurers to make sure they address any potential systemic bias in these systems.
This issue is not new to the AI/ML field and data analytics across the spectrum of health, education, policing and social services. An impressive number of books and studies exist that catalog the issues with algorithm development, data used for training algorithms, data settings (yes, settings or the context where data are collected). These include Ruha Benjamin’s Race After Technology, Sara Wachter-Boettcher’s Technically Wrong, Roberto Simanowski’s The Death Algorithm and Other Dilemmas, Frank Pasquale’s The Black Box Society, Meredith Broussard’s Artificial Unintelligence, Safiya Noble’s Algorithms of Oppression, Louise Amoore’s Cloud Ethics, and more.
The market is full of AI ethics initiatives, and a wealth of information exists on the dangers and risks inherent to AI. But what measures are offered to address these challenges, avoid patient harm and improve outcomes? How can we approach AI critically but without the apocalyptic rhetoric that may be equally unhelpful?
History, Race and Clinical Algorithms: Be careful when using race as a biological construct in models
A very recent New England Journal of Medicine article highlights some of the difficulties in deploying constructs such as race in clinical medicine. In the early 2000s there was the case of the drug BiDil, approved by the FDA in 2005 for race-specific indications for heart failure. The drug ultimately failed but the authors utilize this case study to illustrate how race is often understood in medicine as a proxy for genetic difference when genetic diversity within a population can be as great or greater than across race or ethnic categories. They highlight a number of guidelines in use at the moment with similarly problematic uses of race that could lead to results that actually discriminate:
- AHA’s Guidelines-Heart Failure Risk Score for predicting risk of death in admissions: All Blacks are categorized as lower risk which could result in resources directed away from Blacks.
- The Society of Thoracic Surgeons’ algorithms to estimate the risk of death from surgery with racial and ethnicity proxies lacking an understanding of causal mechanisms. This also runs the risk of steering patients at high risk away from surgery.
- The Vaginal Birth after Caesarian algorithm used to predict risk posed by a trail of labor for someone who has previously undergone caesarian section predicts lower likelihood of success for anyone identified as African-American or Hispanic. Several social determinants variables were excluded from the algorithm that could confound the issue and errors on these cases could impact already high maternal mortality rates for African-American women.
Recent areas where bias has been identified include:
- In January Google unveiled a new AI application developed by Deep Mind for analyzing mammograms for breast cancer that was immediately criticized for lacking demographic data that could shed light on potential racial bias.
- A computer vision application in dermatology for identifying suspect melanoma lesions with 95% accuracy was found to have been trained on datasets that were only 5% non-white patients.
What the clinical algorithm examples illustrate is how race and ethnicity as broad proxies in research can obfuscate more complex biological and social relationships constituting race that will impact a given health outcome. In the Optum example above the use of cost data as a proxy for race occluded the issue of access to care by race and under-insurance.
In many of these cases, they risk the using of “race in interpreting of racial disparities as immutable facts rather than as injustices that require intervention.” In many of these examples the developers and users of algorithms will need to tease out the “complex interactions of race, racism, socio-economic status and environment” (NEJM article). The key lesson here is to evaluate the underlying biology and sociology of the health conditions of focus and make sure the pathways are clearly understood and do not inadvertently reinforce structural racism.
Technological Tools for Assessing Algorithms and Bias in Services
Within the industry there are a number of initiatives to address bias in AI. The field of FAT ML (fairness, accountability and transparency in ML) that works towards building better algorithms that can assist regulators and others to uncover the undesirable and unintended consequences of machine learning. A healthcare focused variant of this is found in Fair ML for Health. A number of technology partnerships such as the Partnership for AI, Ethics and Governance of AI Initiative to name a few. A recent article in Boston Review raises issues with the notion that purely technological interventions aiming for neutrality will successfully address the issue of bias in data and/or data settings. They emphasize that regulatory bodies and human oversight through collaborations across developers, users and patients groups will be needed.
In healthcare we are increasingly looking at voice recognition and assistants for everything from diagnosis to chatbots. However, there are similar bias issues found in the underlying algorithms for voice-based AI as well. A recent study of all of the major voice assistant platforms found that these systems misidentified 35% of the words spoken by African-Americans. To address this issue, Artie, a platform for games on social media, has developed a tool for detecting demographic bias in voice apps.
Dashboards have become another tool being used in a number of hospital systems during the current pandemic. Mass General Brigham Hospital developed a dashboard and data infrastructure to track disparities in impact of COVID-19 on patients AND staff. The tool is interesting from a number of perspectives including:
- Focus on actionable measures including ventilators, ICU capacity, deaths and discharges
- Analytics focused on intersectional considerations including geography, gender, socio-economic status and not race alone
- Built upon existing quality and safety structures
- Tailoring of data to targeted users based on needs
- Took into account that employees in services beyond clinical areas got tested less often and would need more comprehensive analysis
The key takeaway: there are a number of technological tools available that can help organizations identify biases and address them, but technology alone will rarely be enough. Human oversight is key.
Another area worth exploring in this post is the data sources themselves. What is the context of data collection and are there factors here that can be the source of bias? STAT News reported a year ago on the reliance on data from Beth Israel Deaconess Hospital in Boston and the MIMIC dataset for a large number of AI/ML algorithms in healthcare. Do the features of this population dataset actually reflect the features of where the algorithms are being deployed now? Other research has raised concerns that green LED used in wearables are inaccurate on dark skin. How does one know the data from remote patient monitoring programs fed into AI engines for analysis is not biased from the source? Red LED is far less problematic and some next generation sensor technology will be utilizing this technology.
Finally, it is important in our current context to address these issues as an industry. We know that essential workers are disproportionately poorer and minorities who are being underserved by the health system chronically. If AI/ML tools are going to be successful in the marketplace they will need to demonstrate their ability to improve outcomes for those suffering a greater disease burden.
A great deal of collaborative work has emerged from the challenges of the pandemic such as data commons (eg. N3C), open publishing of research, and great improvements in data sharing. The next step is to bring the right set of tools and critically minded researchers, patient groups, and other stakeholders to help innovate around robust algorithms that can equitably address the disease burden in the country.