Recently the FDA, Health Canada, and the United Kingdom’s Medicines and Healthcare products Regulatory Agency (MHRA), released their “Guiding Principles for Good Machine Learning Practice” to help the AI/ML industry navigate both patient safety and continuing innovation in new devices and AI/ML algorithms. These principles are derived from those used in the medical device field and other sectors to help guide best practices.
The principles include the following:
- The total product life cycle uses multidisciplinary expertise.
- The model design is implemented with good software engineering and security practices.
- Participants and data sets represent the intended patient population.
- Training data sets are independent of test data sets.
- Selected reference data sets are based upon best available methods.
- Model design is tailored to the available data and reflects intended device use.
- Focus is placed on the performance of the human-AI team.
- Testing demonstrates device performance during clinically relevant conditions.
- Users are provided clear, essential information.
- Deployed models are monitored for performance, and retraining risks are managed.
These include some measures (numbers 1, 3, 4, 5) that are at least partly intended to address the bias issue that can impact the performance of AI/ML tools across diverse populations. And to clarify, these are guidelines for software that is intended to treat, diagnose, cure, mitigate, or prevent diseases or other conditions that fall under the FDA’s regulatory umbrella.
This is a growing area of concern after several models used by the industry were found to have substantial racial bias in recent years. The other notable aspect of the Guidelines is the role that international cooperation is playing in their development. These principles reflect recent policy guidance from WHO on Ethics and AI that proposed six ethical principles:
- Human control (in the loop)
- Accountability (sometimes through transparency/explainability)
As broad-based principles that provide guideposts to the industry, the FDA and related bodies have developed an important set of guidelines. For those in the trenches of device development, however, these principles may not be enough to ensure patient safety, protections against bias, and a number of other ethical issues that emerge in practice.
The Guiding Principles should be viewed as a start, not a definitive framework, for how the FDA will be regulating AI/ML (SamD) in the future. Here are three major issues facing the potential industry-wide implementation of these principles.
- The list is long on principle and short on detail. The 10 principles reflect a growing consensus around broad principles that industry can use to balance patient safety with innovation. More explicit details, however, are still needed to address some of the most controversial aspects of AI/ML in healthcare, but this can be a Herculean effort in a fast-moving space where legislation often lags well behind the rate of innovation.
- The field of AI/ML is still seeing a large number of models entering the marketplace with problematic biases. Despite clearing over 300 SaMD (Software as a Medical Device) submissions and algorithms to date, it is becoming clear that there are problems regarding racial and cultural bias. The guidelines are a sign of accelerating activity at the FDA, but defining specific parameters to demonstrate safety and no [un]intended bias continues to prove elusive at the regulatory level.
- The gap between guidelines and practice remains wide – meaning industry, civil society and government need to cooperate to fill in the gaps. Industry attempts to address the trust issue via tools such as explainable AI have demonstrated mixed results, at best. Translating these guidelines into concrete frameworks for vendors and users will require a substantial amount of research and experimentation. If successful, this will power the next generation of actionable and effective tools to improve patient safety, mechanisms for harm adjudication, and sustainability to build trust.
Conclusion: Operationalizing Principles into Practice
To date the FDA has now cleared well over 300 AI/ML-based algorithms and devices according to listings through mid-2021. Some recent approvals have included:
- IDx-DR for detection of diabetic retinopathy
- OsteoDetect for diagnosing wrist fractures
- Guardian Connect System for continuous glucose monitoring
- Zebra Medical Vision for vertebral fracture and osteoporosis detection
This is just a small sampling of the range of AI/ML applications that have received clearances in the past year or so. Not all companies label their products as AI/ML based; it can be difficult to assess the true number of AIML SaMD devices currently cleared or having passed through the 510(K) clearance process.
While the growing number of clearances is one positive sign, we have also seen problematic models making it into the market and possibly leading to patient safety issues. From the Epic sepsis detection model woes, to a slew of algorithms for detecting COVID from radiological images, a seemingly large number of cases of error-prone models have made it clear that more needs to be done both within and without the FDA.
Over a year ago, the New England Journal of Medicine published research highlighting a large number of clinical algorithms (not all AI/ML-based) that are currently in use that contain substantial racial bias. These bias issues are proving to be rather challenging issues to resolve, and a number of measures such as “explainable AI” as well as additional AI applications that claim to root out bias have been thrown at the bias issue without truly solving the problem in any substantive manner.
In future coverage, Chilmark Research will be examining these approaches in more depth and how the industry is attempting to address the trust issue. Despite best intentions, attempts at creating transparency have had major shortcomings due to confounding variables in complex models.
It is encouraging to see the FDA accelerating their overdue thinking on the matter. However, we still have a long way to go to have a consensus on standards for validation, replicability, bias, and a host of other issues that lead to both clinicians and patients trusting AI/ML applications in their practice and care.
With the growing threat of misinformation in our biomedical and public health ecosystems, the industry should be cognizant of the fact that AI/ML could be only one conspiracy theory away from a vaccine misinformation-like challenge that makes building trust even more difficult. The time is right for civil society organizations, industry, and government to deepen the processes for data governance, model evaluation and transparency in ways that reflect the growing importance of algorithms in our health systems.