AI in Regulated Industries: Why Defensibility Is the Work, Not the Feature
Image: Depositphotos
People outside regulated industries often assume the hardest part of the job is making the right decision. In practice, the harder part is explaining that decision later to someone who was not there, does not know the context, and is paid to be skeptical.
In medical devices, life sciences, finance, and aviation, an answer is never final. It is a position that must survive audits, reviews, and questions that arrive months or even years after the work was done. Every conclusion must be anchored to evidence that still makes sense when revisited.
This reality shapes how artificial intelligence is adopted in these environments, and it explains why so many AI initiatives stall once they move beyond demonstrations. The problem is rarely model performance. It is whether the system leaves behind reasoning that can be defended.
AI is often framed as a way to reduce complexity. In regulated settings, complexity itself is not the enemy. The real risk is output that cannot be traced, verified, or reconstructed. A system that sounds confident but cannot show where its conclusions came from does not reduce risk. It creates it.
Over time, this leads to a practical conclusion. In regulated industries, defensibility is not something you add after the system works. It is the work itself.
Why “Accuracy” Is the Wrong Starting Point
When teams evaluate AI for compliance or regulatory support, the first question is usually about accuracy. That question is understandable, but incomplete.
In practice, regulatory professionals do not treat accuracy as a binary outcome. They treat it as a chain of defensibility. An answer can be technically correct and still rejected if its source cannot be confirmed, if it relies on an outdated standard, or if the reasoning cannot be reconstructed later.
This is easy to miss if you have never sat through an audit or responded to a formal request for clarification. In those moments, nobody cares how fast the answer was generated. What matters is whether you can point to the exact paragraph, version, and context that justified the decision at the time.
That is why many AI systems fail quietly in regulated settings. They optimize for fluency and speed, while the users optimize for verification and control.
One regulatory lead described an audit where an AI-generated summary was technically accurate but unusable. When the auditor asked where a specific requirement came from, the system could only respond that it was “based on FDA guidance.” That lack of specificity triggered a manual re-review of the entire submission. The failure was not the content of the answer, but the inability to defend it.
In another case, a team relied on an AI tool to surface applicable standards. Months later, they discovered the system had been referencing a superseded version of a key document. Nothing had failed loudly. The output looked reasonable. The problem only surfaced when reviewers noticed wording that no longer existed in the current revision. Fixing it required retracing weeks of work.
Where General AI Tools Break Down
Most failures follow the same pattern.
The first issue is opacity. A model produces a clean, confident statement, but there is no clear link back to the underlying source. Even when retrieval is involved, the citations are vague or inconsistent. For regulated work, that alone is enough to disqualify the output.
The second issue is instability. Ask the same question twice and the system surfaces different supporting documents. From an engineering perspective, this might be acceptable. From a compliance perspective, it is a red flag. Consistency is part of due diligence.
The third issue emerges more slowly. Regulatory content changes over time, often without fanfare. A clause is revised. A standard is superseded. A guidance document is updated quietly. An AI system that does not track these changes does not just become outdated. It becomes misleading, because it continues to answer with confidence.
Finally, there is the temptation to automate judgment. Many teams push AI to do more than it should, asking it to interpret requirements or resolve ambiguity. In regulated environments, this is where trust breaks down fastest. Professionals are comfortable delegating search and organization. They are not comfortable delegating accountability.
Figure 1 illustrates where this breakdown typically occurs. Traditional AI pipelines move directly from unstructured documents to model outputs, skipping the layers that regulated work depends on: version control, structured extraction, stable retrieval, and explicit citation. Without these layers, even strong models produce results that cannot withstand scrutiny.
Defensibility Is Built, Not Claimed
In practice, defensibility behaves less like a model attribute and more like infrastructure. It is assembled piece by piece, and each piece limits what the system is allowed to do.
One of the most effective design decisions we saw was deceptively simple: no citation, no answer. If the system could not point to a specific paragraph or table, it stayed silent. That rule alone changed user behavior. Conversations shifted from debating phrasing to inspecting evidence.
Another critical element was retrieval stability. Deterministic retrieval may not sound exciting, but it makes reviews possible. When the same question leads to the same sources, teams can reason about changes over time instead of questioning the system itself.
Equally important were boundaries. The AI was allowed to summarize, organize, and highlight. It was explicitly not allowed to reinterpret regulations, resolve conflicts on its own, or guess when information was missing. When uncertainty existed in the source material, the system surfaced it instead of smoothing it over.
Finally, the workflow mattered. Outputs were designed to be reviewed, edited, and sometimes rejected by humans. This kept authority where it belongs while still removing hours of manual searching.
What This Looks Like in Medical Devices
In medical device regulation, these principles show up quickly.
A useful system does not try to “decide” whether a device qualifies for a pathway. Instead, it assembles the surrounding context: the classification history, the relevant standards, excerpts from guidance documents, and notes on where language has changed between revisions.
When documentation conflicts, the system flags it. When a standard was updated recently, it highlights the date. When the text itself is ambiguous, it says so plainly.
This behavior builds confidence, not because the AI is always right, but because it is predictable. Over time, teams begin to rely on it as a way to reduce blind spots, not as a replacement for expertise.
This Is Bigger Than Medtech
The same dynamics apply in finance, aviation, energy, and legal compliance. These fields do not adopt AI because it is novel. They adopt it when it makes outcomes safer, reviews faster, and decisions easier to defend.
Across these domains, the winning systems share a common trait. They treat evidence, provenance, and version awareness as first-class concerns. The model is important, but it is not the center of gravity.
A Practical Path Forward
Organizations trying to apply AI in regulated environments can make progress without waiting for new breakthroughs.
Start by improving the structure of your documents and data. Evaluate AI tools based on traceability, not cleverness. Design workflows that assume human review rather than trying to eliminate it. Treat AI as a research assistant whose job is to gather and organize evidence, not to make final calls.
No one expects AI to simplify regulation. What it can do is reduce the friction around it, if it is built with the right constraints.
In regulated industries, defensibility determines what survives contact with reality and what quietly disappears. AI is no exception. The teams that treat defensibility as the core engineering problem will be the ones that see durable results.
Beng Ee Lim and Mert Zamir work at the intersection of artificial intelligence and regulatory systems, with hands-on experience building AI tools for highly regulated industries including medical devices and life sciences. Their work focuses on trustworthy AI design, data integrity, and practical deployment in compliance-driven environments.
Beng Ee Lim
Mert Zamir