An evaluation of Epic’s Sepsis Model revealed its poor performance and sheds light on the need for peer review of algorithms that is lacking in proprietary models. Adoption of new technologies can be enhanced through transparency and peer review that confirms the accuracy and solid performance of models used in AIML.
Studies of other sepsis models demonstrate that multidisciplinary stakeholders are important for adoption of the model, but additional work is needed to integrate the model into clinical workflows. Institutional and social changes to systems that are being “disrupted” may require repair; that is, additional labor from human resources such as nurses that are often undervalued and under-recognized.
The underlying clinical science for sepsis still lacks a coherent evidence base that makes AIML development for sepsis challenging. For clinical conditions lacking clarity in the evidence base, or where therapies and science change more frequently, the risk of bias in labeling data that goes into a model can become a substantial problem.
Sepsis is a major healthcare issue in the US. Currently there are nearly 1.5 million hospitalizations annually, resulting in longer lengths of stay and nearly 250,000 deaths annually. The complexity of cases and magnitude of the problem has led to a significant number of researchers and institutions building clinical decision support (CDS) tools to assist clinicians in identifying sepsis cases earlier and improving long-term outcomes.
Early identification of cases is important in identifying the correct anti-microbial therapy at the correct time to optimize outcomes. Misidentification of pathogens and their resistance to antibiotics has a substantial impact on outcomes and can increase the mortality rate significantly.
A recent analysis of Epic’s Sepsis Model (ESM) by researchers at the University of Michigan found that the model developed by Epic fell short in identifying sepsis cases accurately. The model was developed on over 400,000 patient records across 3 health systems and included billing codes as data used in training the model.
The study by Wong et al. (link above) examined nearly 27,700 patients with 38,455 hospitalizations between late 2018 and autumn 2019. The results of the study indicated that the model missed 67% of the sepsis cases and had frequent false alarms contributing to alert fatigue in clinicians. The authors of the study raise red flags about the lack of transparency with proprietary algorithms, such as Epic’s, and their widespread use in the healthcare system.
The ESM case is the latest case to come into the public eye on machine learning models used in clinical practice that have flaws or bias. This often leads to warranted skepticism from clinicians about the use of AIML in healthcare for decision support.
Non-peer reviewed models and algorithms and documentation may not accurately assess real performance in the clinic. A close look at some of the clinical dimensions of sepsis, more human centered design approaches for AI, and emerging views on transparency may shed light on how to improve these tools and build trust.
Reactions to the University of Michigan Study
Epic responded to publication of the study by citing a study published this past January that involved 11,500 patients at Prisma Health (South Carolina) that resulted in a 4% decrease in mortality from sepsis. But is this enough to trust the model and the model maker?
Alternatively, John Halamka, President of the Mayo Clinic Platform, asks whether digital health should take a page from Amazon and Walmart in how they utilize design thinking, workflow integration, and implementation science with the algorithms and vendors in their supply chain to ensure quality outcomes for users. These processes involve using stakeholders from different disciplines and engaging with the socio-cultural processes that impact use and development of the models.
I’m not so sure that this alone is enough.
Another much-lauded sepsis algorithm developed at Duke University Medical School with the involvement of a cultural/medical anthropologist may shed some light on this challenge. Called Sepsis Watch, the model took 3 ½ years and over 32 million data points and was implemented in 2018. Stat News covered the process of using an anthropologist to engage stakeholders so that adoption of the model would be enhanced through an understanding of the socio-cultural issues.
But a more recent report highlighted some existing challenges. Data and Society examined Sepsis Watch from a socio-technical perspective and found that the amount of human labor that goes into harmonizing a socio-technical intervention such as Sepsis Watch into existing clinical workflows. They note that there are “breakages” or changes that require human resources–such as nurses–to do the additional labor of integrating the innovation into workflows. Often this work is undervalued or unrecognized.
The Clinical Picture for Sepsis
Even the clinical dimension of sepsis is blurrier than most would know. In the months prior to this study a paper appeared in Frontiers in Medicine that highlights the fact that the terminology and definitions of sepsis vary across researchers who are studying machine learning and sepsis models. These authors examined 18 papers on sepsis models. While they discuss the black box issues of interpretability and explainability of models, the most interesting issues are the lack of specific clinical definitions of sepsis and disagreement over the optimal timing for antimicrobial therapy.
What this can mean for development of models is that the gold standard of expert judgement for labeling sepsis in data is jeopardized due to lack of agreement across clinical experts on sepsis definition and optimal treatment protocol.
Outputs of various sepsis models may also change based on the type of model. If clinicians do not have access to the computational details of the model(s) they may lack the information they need to know how heavily to rely on the model per a given patient. Once again, engagement early on with end users can help address skepticism and uncertainty clinicians may have about a novel technology.
One approach to a number of these issues is the “research first” approach that Bayesian Health is using for their sepsis model. They tested and built the model over two years with providers from 5 different health systems before launching the platform within EHR systems more widely. The impact was a 1.85 hour increase in speed of antibiotic treatment for sepsis and sustained, higher adoption by physicians and nurses (89% adoption rate).
The EMS controversy created quite a stir in the medical community given the dominant position of Epic in the market and the details of the University of Michigan evaluation. It is always a good thing to shed light on the potential failure of an important clinical algorithm that may have bias issues that could harm patients or obstruct the work of clinicians.
But the alarm bells over the initial reporting overlook a number of scientific and socio-cultural issues that AI/ML developers will need to contend with–issues that our current regulatory system has relatively little to say about
We’re in unregulated waters here. It could be a very good time to start paying attention to the deeper socio-technical twists and turns that arise from close observation and study. Even our language of ethics – replete with “thou shall” and “thou shall not” – does not fully capture the extent of the challenges in clinical adoption of AI/ML. Insights from the literature on socio-technical systems and the social sciences will likely get us much further in creating systems that work for stakeholders and reducing the harm to patients and broader society. As the Bayesian Health case study has shown, adoption of clinical AI systems will have a very different trajectory from traditional software/algorithm-based clinical decision support systems and will need to include end-users from the beginning.