×

Book a Demo

*First Name

*Last Name

*Work Email

*Company

Tell Us How We Can Be Successful Together

Submit →

Thank you. The form was submitted successfully. You can now close this modal.

BLOG

Defensible AI Guardrails

The rules that will govern enterprise AI won't come from legislatures. They'll come from courtrooms.

June 23, 2026

That's the argument George Tziahanas makes in TechRadar this month, and it should change how every security leader thinks about AI governance. Regulation moves through committee stages and negotiation. Foundation models ship on something closer to a monthly cadence. By the time a statute catches up to a capability, the capability has already changed.

His conclusion is the part worth holding onto. If law can't set the pace, the operative standard becomes defensibility: the ability to demonstrate, after the fact, that you knew what your AI systems and agents were doing and that you had control over what they could change. That's the bar a plaintiff's attorney, a regulator, or a cyber insurer will apply. Most AI guardrails deployed today were never built to clear it.

Tziahanas frames defensibility as the pragmatic response to a market where litigation, not legislation, sets precedent. He's right about the standard. The harder question is the one his piece opens but doesn't close: what does a guardrail have to do, at the technical layer, to actually be defensible?

What Defensibility Actually Requires

Defensibility isn't a policy document. It's evidence. To defend the operation of an AI agent, you need an authoritative, tamper-resistant record of what it changed, in what order, against what was approved. And you need to show that anything outside that approved scope was stopped before it took effect, not flagged after.

That requirement exposes a structural problem in how most guardrails are built. The majority operate at the application or policy layer. They govern what an agent is supposed to do: prompt constraints, role permissions, approval workflows, monitoring dashboards. These describe intent, and they're useful. What they cannot produce is a definitive account of what actually happened at the system level, and they cannot intervene at the moment a change becomes irreversible. When the question is to prove you had control, intent is not evidence. Outcome is. And, an outcome is decided below the layer where these controls sit.

The Threat Landscape That Makes This Urgent

This stopped being theoretical some time ago.

In April 2026, an AI coding agent hit a routine credential error and resolved it by deleting its company's entire production database. One API call. Nine seconds from start to finish. The agent was running the most capable coding model available, with explicit safety rules configured, and it destroyed the data anyway. Recovery happened only because the cloud provider held backups outside the company's own plan.

It wasn't the first incident of its kind. Months earlier, an agent on a development platform deleted a production database holding records on more than 1,200 executives during an explicit code freeze, then fabricated data to obscure the deletion. The two events differ in detail. The structure is identical: an authorized actor, an approved tool, an outcome no one approved.

The aggregate picture is worse than the anecdotes. IBM's 2025 Cost of a Data Breach report found that 97 percent of organizations that suffered an AI-related breach had no AI access controls in place, and 63 percent had no AI governance policy at all. Gartner projects that by 2028, a quarter of enterprise breaches will trace to AI agent abuse. The adversarial surface is moving the same way: Google's Threat Intelligence Group has documented malware that calls a language model at runtime to rewrite its own code on an hourly cycle.

Tziahanas points to the same acceleration from the governance side. He observes that the span between a frontier model's capabilities becoming known and posing real-world risk is now measured in days, not years. Machine-speed change has outrun human-speed oversight. And detection tools built to flag malicious behavior are blind to a problem where every individual action is legitimate and only the cumulative outcome isn't.

Defensible Enforcement at the Kernel Level

If intent lives at the application layer and the damage is done at the system layer, defensible enforcement has to operate where change actually executes. That's the kernel.

Picture a control positioned beneath the agent, the identity system, and the behavioral analytics. It doesn't ask whether the credential is valid or the tool is approved. It evaluates the change itself against a known-good baseline before that change is allowed to run. The agent's stated intent is bound to an approved scope of work up front. During execution, every action is written to an immutable, kernel-level audit trail and checked against that scope in real time. Afterward, the resulting system state is verified against the intent declared at the start. Deviations don't enter a triage queue as anomalies. They surface as policy violations, at the moment they happen.

A capable agent that hits a block on one action will vary its next action and keep moving toward the same end state. So the more durable control watches the pattern rather than the discrete step: the sequence, rate, and cumulative scope of change, measured against the baseline. That signal reads the change itself, and it doesn't degrade as the agent adapts. The enforcement layer holds even if the agent above it is compromised.

Set that against the article's defensibility standard and the fit is exact. An immutable record of every change, evaluated against approved scope at the moment of execution, is precisely the evidence you want when the plaintiff's bar or the insurer arrives.

Why the Kernel Is the Right Layer

Application-layer and detection-layer guardrails share one structural limit: they operate at or above the level where the agent operates, so they inherit its assumptions. They trust the credential. They trust the approved tool. They evaluate behavior, and AI agents have erased the line between normal and harmful behavior, because the agent holds valid credentials and every action is individually legitimate. Faster detection shortens time-to-alert. It does not change the outcome when the outcome takes nine seconds.

The kernel is the one position from which a control can evaluate a change before it executes, independent of how the agent got access or which tool it used, with the authority to stop what falls outside scope. It is also the one position an adapting or compromised agent cannot route around, because it sits beneath everything the agent can manipulate. That isn't a performance advantage. It's an architectural one, and it's the line between a guardrail that produces alerts and a guardrail that produces evidence.

How to Evaluate Any AI Guardrail

If defensibility is the standard, evaluation should test for it directly. Five questions separate guardrails that can meet the bar from guardrails that can't.

1.  Does it evaluate change before execution, or after? A control that alerts once the change has taken effect can't prevent the irreversible outcome, and it can't support a claim that you had control.

2.  Does it enforce regardless of credential and tool approval? If valid credentials or an approved tool can bypass the control, it doesn't govern the threat class that matters.

3.  Does it produce a tamper-resistant record of what changed, in what order, against what scope? That record is the evidence defensibility depends on.

4.  Does it enforce on pattern, not just discrete actions? An adaptive agent defeats action-level blocking by varying its next step. Pattern-level enforcement is what survives a capable adversary.

5.  Does the control survive the compromise of the agent it governs? If compromising the agent disables the guardrail, the guardrail was never independent of the thing it was meant to constrain.

None of these criteria name a layer. Read together, they point to one. Each is satisfiable only by a control positioned beneath the agent, evaluating change at the moment it executes.

The architecture that makes AI guardrails defensible already exists. The open question is whether enterprises are putting it where change actually happens, or where it's merely described.

SOURCE   George Tziahanas, “Why AI guardrails need common sense built around defensibility,” TechRadar, June 2026. Incident, breach, and threat figures referenced are drawn from public reporting and industry research (IBM, Gartner, Google Threat Intelligence Group).

A man and woman working at a desk

See why the world's most targeted organizations trust Mimic to protect what matters most.