Over 1,300 FDA-Cleared AI Devices and No Liability Framework: The Regulatory Gap in Diagnostic AI

The FDA has cleared over 1,300 AI-enabled medical devices for the U.S. market. The number has doubled since 2023 alone. Most are in radiology, where AI now reads mammograms, flags tumors, and triages CT scans at a scale no human workforce could match.

But here’s what the approval numbers don’t tell you: less than 2% of those devices were supported by randomized clinical trials. 97% were cleared through the 510(k) pathway, which requires only that a device demonstrate “substantial equivalence” to an existing product — not independent evidence of safety or clinical effectiveness. And there is no federal liability framework that defines who is responsible when a diagnostic AI system harms a patient.

The devices are approved. The evidence is thin. The legal accountability is undefined. For compliance teams at health systems, device manufacturers, and insurers, that gap is the story.

Diagnostic AI Is Outpacing Its Own Evidence Base

The growth is undeniable. The FDA cleared just 6 AI/ML devices in 2015. By 2023, that number hit 221 in a single year. In 2025, the FDA cleared a record 295 AI/ML medical devices. Radiology accounts for roughly 75–80% of all listed devices, with cardiology and neurology making up most of the rest.

And the technology works — sometimes remarkably well. In Sweden’s MASAI trial, the largest randomized study of AI in mammography screening to date, AI-supported screening detected more cancers than standard double reading by radiologists while reducing the screen-reading workload by 44%. The AI system triaged mammograms by risk score, routing only the highest-risk cases to double reading.

But the evidence is uneven. In a randomized trial published in Nature Medicine, an AI system helping general cardiologists assess complex genetic cardiomyopathies produced better assessments than specialists alone — yet 6.5% of its responses contained clinically significant hallucinations. The errors were only caught when a human cardiologist questioned the AI’s output directly.

And in a separate Nature Medicine evaluation, ChatGPT’s most advanced model triaged medical cases incorrectly more than half the time, telling patients who urgently needed emergency care to stay home. As cardiologist and researcher Eric Topol noted of that finding: “We have a long way to go.”

The CRAFT-MD framework, a research evaluation tool published in Nature Medicine in January 2025, found that all tested LLMs — including GPT-4 — showed significant diagnostic accuracy drops when moved from structured exam questions to realistic clinical conversations. GPT-4’s accuracy fell from 82% on static vignettes to 62.7% in conversational settings. Without predefined answer choices, it dropped further to 26.4%.

These are not fringe findings. They describe a structural gap: diagnostic AI performs well on the kinds of standardized tests used to get FDA clearance, and significantly worse in the kinds of open-ended clinical interactions that define real patient care.

The 510(k) Pathway Wasn’t Built for This

The 510(k) process asks a simple question: is this device substantially equivalent to something already on the market? It does not require clinical outcome data. It does not mandate prospective trials. It does not assess how a device performs when integrated into a real clinical workflow rather than a controlled validation dataset.

A JAMA Network Open study found that clinical testing of AI radiology devices remains uncommon even as approvals accelerate, and most devices have not been validated against defined clinical or performance endpoints. Only about 5% of AI devices experienced any post-market adverse event report, and 5–6% were ever recalled, primarily for software bugs.

The FDA has signaled awareness of the problem. In January 2025, the agency issued draft guidance recommending a Total Product Life Cycle approach for AI-enabled devices, including model descriptions, data lineage, performance metrics, bias analysis, and human-AI workflow documentation. The agency also requires labeling that includes a clear statement that the device uses AI, details on inputs and outputs, performance measures, and any known risks or sources of bias.

But guidance is not regulation. The draft guidance remains under review, and it doesn’t retroactively address the 1,300-plus devices already on the market. For health systems deploying these tools today, the gap between what the FDA requires at clearance and what clinical practice demands is widening.

When Diagnostic AI Fails, Who Pays?

This is the question no one has answered. Geoffrey Hinton, the Nobel Prize-winning computer scientist known as the “Godfather of AI,” put the asymmetry plainly: if a doctor fails to use an available AI tool and a patient dies, no one is sued. If a doctor uses AI and harm follows, liability could be immediate. The system discourages early adoption.

Under current U.S. malpractice law, liability rests on the “reasonable physician under similar circumstances” standard. Whether AI was used or not, the physician typically bears responsibility for the clinical outcome. As one Johns Hopkins analysis put it, when AI contributes to a medical error, the legal system isn’t asking whether the algorithm failed — it’s asking what the physician did.

That creates a double bind. If AI is used only as decision support, the physician who makes the final determination bears the liability risk. But apportioning liability becomes far harder when algorithms developed through neural networks constitute a black box for both manufacturers and physicians.

Multiple liability theories overlap without resolving the question. Physicians and health systems face malpractice and negligence claims. Algorithm developers may face product liability. Hospitals face vicarious liability for their employees’ AI-assisted decisions. But no framework distributes responsibility in a way that accounts for the reality of how these tools work in practice.

Meanwhile, the scale of unaddressed human error is enormous. Topol cited estimates of at least 12 million diagnostic errors per year in the U.S., resulting in roughly 800,000 cases of disability or death. “We don’t tend to talk about that,” he said. “We keep talking about the mistakes the AI makes.”

The HTI-5 Rollback Makes It Worse

This regulatory gap doesn’t exist in isolation. The Trump Administration’s proposed HTI-5 rule would eliminate the federal model card requirement for certified health IT products — the closest thing the U.S. has to a standardized AI transparency mandate in healthcare.

Model cards disclose how a predictive AI tool was built, what data trained it, what populations it was tested on, and what risks are known. Without them, health systems lose one of the few tools available for evaluating whether an AI device is appropriate for their patient population before deploying it.

The California Attorney General has formally opposed the rollback. The Coalition for Health AI found that 90% of its members rated AI transparency as important. CHAI deployed applied model cards across 36 health systems nationwide. The demand for this information exists — but the federal requirement to provide it may not survive.

What Compliance Teams Should Be Tracking

The regulatory picture is moving in several directions at once. The FDA is proposing better pre-market guidance. The administration is rolling back post-market transparency. Liability law hasn’t adapted. And AI devices are being deployed into clinical settings faster than any of these frameworks can keep pace.

For health systems, the practical implications are immediate. Every AI diagnostic tool in clinical use should be inventoried, with documented evidence of its validation data, known limitations, and the clinical workflow it’s embedded in. Vendor contracts should specify who bears responsibility for adverse outcomes, how model updates are governed, and what transparency the vendor provides about training data and bias testing.

For device manufacturers, the January 2025 FDA draft guidance — even in its current non-final form — signals where regulatory expectations are heading. Companies that build Predetermined Change Control Plans, maintain post-market surveillance, and document bias analysis now will be ahead of the curve when final rules arrive.

For everyone: the absence of a clear liability framework isn’t a signal that the risk doesn’t exist. It’s a signal that the risk hasn’t been priced yet. When it is — through litigation, through legislation, or through a high-profile patient harm case — the organizations that can demonstrate documented governance will be the ones best positioned.

There are over 1,300 AI devices on the market. Less than 2% have clinical trial backing. There is no federal standard defining who is responsible when they fail.

That gap will close eventually. The question is whether your organization is ready when it does.


AI Compliance Insider covers the regulations, tools, and incidents shaping AI governance. Subscribe for weekly updates.