Why risk must determine the AI you buy — and why most organizations are buying blind.

When a board asks its CIO what AI governance looks like, the answer is usually a policy document, an ethics framework, or a list of principles. What it almost never is: a procurement standard.

That gap is where the real risk lives.

Every time your organization deploys an AI system, a critical decision gets made — usually by a developer, sometimes by a vendor, rarely by anyone with accountability for the outcome. That decision is: what kind of AI are we using? And in most enterprises today, the honest answer is: we don't actually know, and we don't have the language to find out.

Selecting an AI architecture is not a technical preference. It is a risk-calibrated governance decision that determines auditability, accountability, and consequence. That decision determines not just system behavior, but who is legally and operationally accountable when it fails.

This piece is about changing that. Because the form of AI you deploy should be determined by the risk of the use case it serves — and your procurement language needs to reflect that.

THE PROBLEM

Most organizations treat all AI as the same product

Walk into any enterprise technology conversation today and you will hear AI described as though it is a single category of thing. Organizations talk about "deploying AI," "governing AI," "auditing AI" — as if the word names a uniform technology with consistent properties.

It doesn't.

The AI market currently contains at least three architecturally distinct types of system, each with radically different properties for governance, explainability, and accountability:

• Probabilistic AI — statistical pattern recognition engines (most large language models, generative AI, most modern ML). The same input can produce a different output across runs. Explainability is partial and model-dependent. Causal attribution is difficult.

• Probabilistic AI with context orchestration (and controls) — probabilistic models constrained by defined data sources and policy routing logic. You can control what the model reasons from. You cannot guarantee identical outputs under identical conditions.

• Deterministic AI — logic-governed, rule-based, or ontology-driven systems. Every result can be verified against defined logic. Outputs are designed to be consistent and reproducible, with a full trace possible by design: what data, which rules, why that result.

In practice, most enterprise deployments combine these architectures — a probabilistic model sitting inside a deterministic routing layer, inside a retrieval pipeline with policy guardrails. The governance obligation is to know which components operate under which properties, and to hold each component to the controls its risk level demands.

The governance implications of these three types are not equivalent. Yet most AI procurement treats them as if they are.

If a system can produce two different answers to the same question under the same conditions — and both could be acted on — it is not operating at the level of rigor that high-stakes decisions require.

THE FRAMEWORK

Risk tier should determine architecture — not the other way around

The governance failure is not that organizations use probabilistic AI. It is that they use it regardless of use case risk, and then attempt to retrofit governance onto a system that was never designed to support it.

The right sequence is the reverse. Start with a clear-eyed assessment of the risk tier of the use case. Then specify — in your requirements, in your contracts, and in your architecture review — what level of explainability, traceability, and accountability that tier demands. Then procure accordingly.

This is an example of how one might define functional and non-functional requirements for explainability in a way that would inform architecture. ULTIMATELY- you would want to do this with all of the your principles (Fairness, Privacy, etc)

The architecture column is not a suggestion. It is a requirement. A high-risk use case — one involving consequential decisions about people, high financial exposure, or outcomes that are difficult to reverse — demand deterministic levels of accountability, whether achieved through deterministic design or rigorously bounded hybrid controls.

THE ACCOUNTABILITY GAP

"A human reviewed it" is not governance — it is liability laundering

Human oversight is the layer that makes AI governance real. But only if it is designed to be real.

Consider what a human reviewer actually needs in order to provide meaningful oversight of an AI output. We hold colleagues to standards of trustworthiness — credibility, reliability, alignment of interest, integrity over time. The AI field has developed specific vocabulary for equivalent properties in systems. They need to be able to interrogate four things:

· Transparency is a question about the model: what data, what methodology, and were those the right validated choices for this use case?

· Explainability is a question about the output: not how the model generally works, but why it produced this specific decision for this specific person.

· Observability is a question about agent behavior over time: is the system still performing correctly as conditions change, data shifts, and it interacts with other systems? And

· Robustness Against Adversaries is a question about cybersecurity: has the model been tampered with since you last assessed it? Is it still operating within the boundaries of its original design and mandate?

Now ask yourself: does your current AI architecture give a human reviewer access to any of that? If the system is probabilistic and the reviewer has no visibility into data lineage, no reasoning trace, and no consistency guarantee — what exactly are they reviewing?

The answer, in most deployments, is: the output. They are reviewing a number, a recommendation, a flag. They are not reviewing the reasoning. They are not reviewing the evidence. They are not, in any meaningful sense, in the loop.

You get more of the behaviors you measure. If your human reviewer is measured on throughput rather than quality of oversight, you have engineered a rubber stamp — and called it governance.

This is what liability laundering looks like in practice. A human is placed in the process. Their sign-off is documented. The accountability box is checked. But the conditions for genuine oversight — the architectural transparency, the training, the authority to halt, the feedback mechanism — were never established.

The fix is building the architecture that makes true human oversight meaningful, making sure you choose people who have domain expertise who understand the context of how this AI is being used in the environment, and then measuring people on the right things.

WHAT TO DO

What C-suite leaders need to require right now

This is not primarily a technology problem. It is a procurement and governance language problem. The organizations that get this right will be the ones that build the vocabulary — and the contractual teeth — to specify what they actually need from AI systems.

Four things need to change:

1. Require architecture disclosure in every AI procurement.

Vendors should be required to specify whether their system is probabilistic, deterministic, or a hybrid -- and what the explainability properties of each component are-- and the right to independently validate those claims through audit, testing, and continuous monitoring. "AI" is not a sufficient category for a procurement decision. "Probabilistic language model with retrieval-augmented generation" is.

2. Map use cases to risk tiers before selecting architecture.

Not every AI use case requires deterministic architecture. A low-risk content summarization tool has different requirements than an AI system making credit decisions or triaging patient care. The risk assessment should happen before vendor selection, not after deployment. And the required architecture — with its explainability and accountability properties — should be written into the requirements, not left to the vendor's discretion.

3. Operationalize principles as functional and non-functional requirements — not statements of intent.

"We are committed to explainable AI" is a statement of intent. "At the Vigilant tier, the system must provide a full audit trail for each output, including source data with documented lineage and provenance, validated test/re-test reliability, and controls ensuring explanations are bound to the evidence record" is a functional requirement. The difference is enforceable. One of these belongs in a contract. The other belongs in a press release.

4. Establish outcome baselines before deployment.

You cannot measure whether an AI system is causing harm if you did not establish a baseline before deploying it. Without pre-deployment outcome data, you can observe outputs but cannot attribute change — you have no way to distinguish the system’s effect from what would have happened anyway. This is not a technical detail. It is the structural precondition for accountability. Every AI deployment should require a documented baseline of the outcomes it is intended to influence, against which post-deployment performance is continuously measured. Where a baseline does not exist, building one is the prerequisite — not a subsequent task.

The regulatory environment makes all four of these non-optional in several jurisdictions.

A note on regulatory urgency

These are not aspirational standards. They are already law or binding guidance in several operating jurisdictions. The EU AI Act (Article 9) requires that high-risk AI systems have a documented risk management system including human oversight measures and accuracy requirements — before deployment. In the United States, federal AI hiring guidance was withdrawn in January 2025 — but the underlying obligations under Title VII and the ADA remain enforceable, and employer liability for discriminatory AI outcomes is actively being litigated in federal court. The Federal Reserve’s SR 11-7 model risk guidance imposes ongoing validation and governance obligations on AI models used in financial services decisions. Organizations operating across these jurisdictions are not choosing whether to govern AI rigorously. They are choosing whether to do so proactively or in response to an enforcement action.

The bottom line

AI governance that lives only in policy documents is not governance. It is aspiration. Real governance is built into the architecture that gets procured, deployed, and measured.

The question is not whether your organization has an AI ethics policy. The question is whether the AI systems you are buying can actually fulfill the accountability obligations you have assumed — to your customers, your regulators, and the people most affected by these decisions.

Risk should inform architecture. Architecture should be written into requirements. Requirements should have contractual teeth.

The organizations that will lead on AI governance in the next five years are not the ones with the most sophisticated ethics frameworks. They are the ones that have translated those frameworks into the language of procurement, architecture, and measurement — and held their vendors to it.

That work starts with learning to ask the right question: not "do we have AI governance?" but "does our AI architecture support the accountability we have promised?"

About the author

Phaedra Boinodiris is Global Lead of IBM Consulting's Responsible AI Practice and author of AI for the Rest of Us. She is a Fellow of the Royal Society of Arts and was asked by NC Governor Josh Stein to serve as a member of the NC AI Leadership Council.

Risk Should Determine Your AI Architecture

Why risk must determine the AI you buy — and why most organizations are buying blind.