Regulatory Frameworks for Medical AI
AI in medicine must navigate rigorous regulatory frameworks designed to protect patients — understanding these frameworks is essential for anyone bringing AI tools into clinical practice.
Software as a Medical Device: the foundational concept
The regulatory journey for medical AI begins with a definitional question: when does software become a medical device? The FDA's answer, developed through years of guidance documents, is the concept of Software as a Medical Device (SaMD) — defined internationally by the International Medical Device Regulators Forum (IMDRF) as "software intended to be used for one or more medical purposes that perform these purposes without being part of a hardware medical device."
The critical element is the word "intended." Software that provides clinical decision support — recommending a diagnosis, suggesting a treatment, identifying a pathology in an image — is intended to influence clinical decisions and therefore falls within the SaMD category. Software that merely stores or transfers data, without analyzing or interpreting it for clinical purposes, generally does not. The line between "clinical decision support" and "administrative software" has been a persistent source of regulatory ambiguity as AI capabilities have expanded.
The FDA further classifies medical devices (including SaMD) by risk level. Class I devices pose minimal risk and are subject to general controls. Class II devices pose moderate risk and require special controls, typically including 510(k) pre-market notification. Class III devices pose the highest risk — supporting or sustaining life, or preventing health impairment — and require the most rigorous pre-market approval (PMA) pathway. Where a given AI tool falls within this classification determines the entire regulatory pathway it must traverse.
Pre-market pathways: 510(k), De Novo, and PMA
Understanding the three primary FDA clearance pathways for medical AI is essential for anyone developing or evaluating medical AI products.
The FDA's AI/ML Action Plan
In January 2021, the FDA released its Artificial Intelligence/Machine Learning-Based Software as a Medical Device Action Plan, acknowledging that existing regulatory frameworks were not designed for a fundamentally new kind of software: one that could change its behavior after deployment by continuing to learn from real-world data. The action plan identified five areas of focus that have since shaped FDA engagement with medical AI.
Central to the action plan was the concept of a Predetermined Change Control Plan (PCCP) — a mechanism allowing manufacturers to specify in advance the types of changes that might be made to an AI algorithm post-clearance and the validation protocols that would govern those changes. Rather than requiring a new 510(k) submission every time a continuously-learning algorithm updates its weights, the PCCP framework would allow pre-approved types of updates to proceed without triggering full re-review. The FDA issued draft guidance on PCCPs in 2023, moving this concept closer to operational implementation.
Traditional medical device regulation assumes a fixed product: you test the device, clear it, and the device you tested is the device that reaches patients. Adaptive AI algorithms violate this assumption. A neural network that continues learning on real-world clinical data may perform quite differently six months after deployment than it did at the time of clearance. The regulatory challenge is profound: how do you ensure ongoing safety and effectiveness for a product that is, by design, continuously changing? The PCCP framework is the FDA's evolving answer — but it remains a work in progress.
Locked versus adaptive algorithms
The regulatory framework draws a critical distinction between locked and adaptive AI algorithms. A locked algorithm does not change its behavior after deployment — the weights of the neural network are frozen, and the model produces the same output for the same input regardless of how much additional data it has encountered since clearance. Regulatory review of a locked algorithm is straightforward in principle: what you test is what you clear, and what you clear is what patients receive.
An adaptive algorithm, by contrast, continues to modify its behavior based on new data encountered in deployment. Continuous learning algorithms may update their weights in real time or on a rolling schedule, potentially improving performance but also potentially drifting in ways that reduce performance for specific subpopulations or introduce new failure modes. The model a patient encounters in month twelve may be meaningfully different from the model that was cleared in month one.
Most medical AI products currently cleared by the FDA use locked algorithms specifically because the regulatory pathway is clearer. The development of frameworks for adaptable algorithms — with defined monitoring requirements, performance bounds, and change control procedures — is one of the most actively contested questions in medical AI regulation.
European regulation: CE marking and the EU AI Act
In the European Union, medical AI devices are regulated under the Medical Device Regulation (MDR, EU 2017/745) and In Vitro Diagnostic Device Regulation (IVDR, EU 2017/746), which replaced the previous Medical Device Directive framework with substantially more rigorous requirements. CE marking — the conformité européenne mark indicating conformity with EU standards — is required for medical devices placed on the European market.
The EU AI Act, adopted in 2024, adds an additional regulatory layer specific to artificial intelligence. Crucially, the EU AI Act classifies AI systems used in medical diagnosis, prognosis, or patient management as high-risk AI systems — a classification that triggers the most demanding requirements in the legislation. High-risk AI systems must meet requirements for risk management systems, data governance, technical documentation, transparency to users, human oversight provisions, accuracy and robustness standards, and registration in a public EU database of high-risk AI systems.
The interaction between the MDR/IVDR and the EU AI Act creates a complex dual-compliance challenge for medical AI developers: a product may need to demonstrate CE conformity under the MDR and simultaneously meet the high-risk AI requirements of the AI Act. The European AI Office is responsible for coordinating the Act's implementation, and guidance on how the two regulatory regimes interact for medical AI specifically remains an area of active development.
Post-market surveillance requirements
Regulatory approval is not the end of regulatory engagement — it is the beginning of an ongoing post-market relationship between the manufacturer, the regulator, and the clinical community. Both FDA and EU frameworks impose post-market surveillance requirements that are particularly significant for AI tools given their potential for performance drift.
Under FDA requirements, manufacturers of cleared AI devices must maintain a Quality Management System (QMS), track complaints and adverse events, and submit Medical Device Reports (MDRs) for events where device malfunction may have caused or contributed to serious patient injury or death. Post-market studies may be required as a condition of clearance, particularly for De Novo or PMA approvals, to generate real-world evidence of performance in broader populations than were studied pre-market.
For AI specifically, post-market surveillance must contend with the challenge of performance drift — the gradual degradation of AI model performance as the real-world data distribution shifts away from the training distribution. A diabetic retinopathy detection model trained on images from specific camera systems may perform poorly when deployed with different equipment. A sepsis prediction model trained on pre-pandemic data may perform differently in a post-pandemic patient population. Detecting drift requires ongoing monitoring against ground truth labels — which in clinical settings means waiting for clinical outcomes that may not be available for weeks or months.
Silent degradation. AI model performance can degrade gradually and invisibly. Unlike a malfunctioning infusion pump, which may generate an alarm, a neural network that has drifted in its accuracy produces no obvious error signal — it simply generates outputs that are increasingly wrong. Without active monitoring against clinical outcomes, drift can persist undetected for months while patients are affected.
Distribution shift in practice. Changes in patient population demographics, camera equipment, scanning protocols, clinical practice patterns, and electronic health record systems can all shift the data distribution away from the training domain, degrading model performance even when the model's weights have not changed.
Clinical validation requirements
What evidence is required to demonstrate that a medical AI tool is safe and effective? The answer depends on the regulatory pathway and the risk classification, but certain principles apply across frameworks.
Clinical validation must demonstrate performance in the intended use population — not just in carefully curated datasets that may not reflect the diversity of real-world clinical practice. A skin lesion classification model trained primarily on images from light-skinned individuals must be validated on diverse skin tones before it can credibly claim performance across the full patient population. A chest X-ray AI trained on images from academic medical centers must be validated in community hospital settings with different patient demographics and image acquisition protocols.
Reader study design — comparing AI performance against physician performance on a held-out test set — has become a standard validation approach. The appropriate comparator is debated: should AI be compared to a single physician, a panel of specialists, the average physician, or the best available care? The choice of comparator significantly affects whether a tool appears to meet the performance bar, and it has real implications for how beneficial AI tools are evaluated.
The strongest validation studies for medical AI share common features: prospective design or rigorously constructed retrospective holdout sets; demographically diverse and clinically representative patient populations; performance stratified by relevant subgroups (age, sex, race/ethnicity, disease severity, imaging equipment); comparison to relevant clinical standards including specialist performance; and pre-registered analysis plans that prevent post-hoc performance cherry-picking. These standards are achievable, and they distinguish regulatory-grade evidence from the performance claims that appear in many AI product marketing materials.
International harmonization challenges
Medical AI developers seeking global market access must navigate regulatory frameworks that differ substantially in their structure, requirements, and timelines — and that are each evolving rapidly and somewhat independently. FDA clearance does not confer CE marking in Europe; a product cleared in Japan under the Pharmaceuticals and Medical Devices Agency (PMDA) framework may require separate review in Canada under Health Canada. Each jurisdiction has different predicate requirements, evidence standards, and post-market obligations.
The IMDRF has worked to develop internationally harmonized guidance on SaMD, including a risk-based classification framework that has influenced regulatory approaches across multiple jurisdictions. The International Organization for Standardization (ISO) has developed standards for AI in healthcare (ISO/IEC 42001, ISO 14971 for risk management) that provide a common technical language across regulatory contexts. However, the gap between high-level harmonization principles and the operational detail of specific regulatory submissions remains substantial.
For healthcare systems evaluating AI tools, the regulatory landscape has an important implication: FDA clearance or CE marking is a necessary but not sufficient indicator of clinical value. These approvals demonstrate that a product met the evidentiary standard for the regulatory pathway chosen — which may be equivalence to an existing predicate (510k) rather than demonstrated clinical effectiveness. Clinical procurement decisions benefit from independent evaluation of the clinical validation evidence, not just the regulatory approval status.