Menu
AboutResearchContact
Get Started
April 2026 · 7 min read

How Should AI Agents Inherit Your Existing Enterprise Security?

AI agents need access to your systems to be useful, but most agent frameworks handle security through prompt instructions ("don't access unauthorized data"), not infrastructure enforcement. In 2025, 43% of tested MCP server implementations had command injection flaws. The right architecture mirrors your existing permissions at the infrastructure level so that a compromised agent physically can't access what it shouldn't, not because you asked it nicely, but because the operating system won't let it.

DN
Founder, Astrohive

Why is agent security different from application security?

Traditional applications do what they're programmed to do. An API endpoint either has access to a database table or it doesn't. The attack surface is the code.

AI agents are different. They interpret instructions, use tools, and make decisions about what to access and when. A prompt injection can rewrite an agent's intent mid-session. A malicious tool description can redirect an agent's behavior without the user seeing it happen. The attack surface isn't just the code. It's everything the agent reads.

OWASP recognized this distinction by publishing a dedicated Top 10 for Agentic Applications in December 2025, separate from their LLM Top 10. The agentic list includes risks that don't exist in traditional software: identity and privilege abuse where leaked credentials let agents operate beyond intended scope, and insecure inter-agent communication where spoofed messages misdirect entire agent clusters.

The core problem is straightforward. Your enterprise already has a permission model. Roles, access levels, data classifications, compliance boundaries. When you deploy AI agents, do those agents respect that model? Or do they operate in a parallel universe where security is enforced by hoping the prompt instructions stick?

What has actually gone wrong in production?

2025 was the year agent security moved from theoretical to real. Nine major MCP-specific breaches were documented, and several made the severity clear.

Elastic Security Labs tested MCP server implementations and found that 43% contained command injection flaws, while 30% permitted unrestricted URL fetching. As the researchers noted: "Any text being fed to the LLM has the potential to rewrite instructions on the client end" (Beretta, Carlock, Pease, September 2025).

Invariant Labs demonstrated a tool poisoning attack that could silently exfiltrate a user's entire WhatsApp chat history by combining a malicious MCP server with a legitimate WhatsApp MCP instance. The attack required zero user approval of malicious tools. In a separate disclosure, they showed how seemingly innocent tools could embed hidden instructions directing AI agents to access SSH keys, config files, and credentials, instructions invisible to the user but visible to the model.

The first zero-click prompt injection exploit in a production system hit Microsoft 365 Copilot. CVE-2025-32711 (EchoLeak) achieved remote, unauthenticated data exfiltration via a single crafted email with zero user interaction (Reddy & Gujral, 2025).

And CVE-2025-12420 (BodySnatcher) showed that with only a target's email, an attacker could impersonate an admin and execute a ServiceNow AI agent to create backdoor accounts with full privileges, bypassing MFA and SSO entirely. ServiceNow's affected products are used by nearly half of Fortune 100 companies (AppOmni, January 2026).

These aren't theoretical. They're production systems, real users, real data.

Apr 2025
Invariant
WhatsApp history exfiltrated via tool poisoning
Sept 2025
Elastic
43% of MCP servers had injection flaws
2025
EchoLeak
Zero-click M365 Copilot data exfiltration
Jan 2026
BodySnatcher
CVSS 9.3, ServiceNow admin impersonation
Four agent security incidents from 2025-2026. Real systems, real data, real breaches.

What does "security inheritance" actually mean?

Security inheritance is a simple principle: an AI agent should operate under the intersection of the user's permissions and the agent's own declared maximum permissions. The agent can never exceed its own ceiling, even if the user has higher clearance.

User Permissions
Agent Max Permissions
Effective Access
Agent can NEVER exceed its own ceiling, even if the user has higher clearance.
The agent operates at the intersection of user permissions and its own ceiling. It can never exceed either.
LayerWhat it controlsExample
RBAC (Role-Based)What role can do what"Editors can modify documents"
ABAC (Attribute-Based)Contextual refinement"Only if the document is Internal classification AND user has clearance"
ReBAC (Relationship-Based)Structural scoping"Only if user has a relationship to this workspace"
ACL (Access Control List)Per-resource overrides"This specific document is shared with this specific person"

The hybrid model (RBAC baseline, ABAC refinement, ReBAC scoping, ACL override) is the production standard. 94.7% of organizations use RBAC, and 86.6% view it as their primary model (Gartner, 2025). But RBAC alone isn't enough for agents, because agents make contextual decisions that a static role assignment doesn't cover. Gartner found that organizations implementing ABAC experience 73% fewer access-related security incidents compared to those using RBAC alone.

For AI agents, the permission check needs to happen at four enforcement points:

  1. User prompt to agent (gateway) - Is this user allowed to ask this agent to do this?
  2. Agent to knowledge base (retrieval) - Can this agent see this data for this user?
  3. Agent to external tools (tool execution) - Can this agent call this API with these parameters?
  4. Agent output to user (output) - Does the response contain data above the user's clearance?

If any of these four checks is missing, you have a security gap.

1
Gateway
User to Agent
2
Retrieval
Agent to Knowledge
3
Tool Execution
Agent to APIs
4
Output
Response to User
Four enforcement points where security checks must happen. A missing check is a security gap.

Why don't prompt-level instructions work?

Because prompts are suggestions, not enforcement. You can write "never access data outside your assigned domain" in a system prompt. But a prompt injection can override that instruction. A tool poisoning attack can bypass it entirely. And there's no audit trail proving the instruction was followed.

Anthropic's own MCP specification acknowledges this directly: "Descriptions of tool behavior such as annotations should be considered untrusted, unless obtained from a trusted server."

The distinction matters. Prompt-level security is like putting a "please don't steal" sign in a store. Infrastructure-level security is locking the merchandise in a case. One relies on compliance. The other enforces it.

PROMPT-LEVEL SECURITY
SYSTEM PROMPT
"Don't access unauthorized data"
×
Bypassed by prompt injection
INFRASTRUCTURE-LEVEL SECURITY
Landlock: file access restricted
seccomp: syscalls restricted
Network namespaces: endpoints restricted
Physically enforced by OS
Prompt-level security is a suggestion. Infrastructure-level security is enforcement.

What does infrastructure-level agent security look like?

The most concrete implementation is kernel-level sandboxing, where the operating system itself prevents an agent from accessing unauthorized resources.

NVIDIA's NemoClaw (17.4K GitHub stars in its first 12 days, March 2026) demonstrated this approach by wrapping AI agents in sandboxed containers using three Linux kernel features:

  • Landlock restricts which files and directories a process can access
  • seccomp restricts which system calls a process can make
  • Network namespaces restrict which network endpoints a process can reach

The key innovation is binary-scoped network access. Even if an endpoint is on the allow list, only specific binaries can reach it. If a prompt injection tries to use curl or python3 to hit an approved API, it's blocked because only the designated agent binary has network access.

Credentials never enter the agent's execution environment. The agent makes requests to a local inference gateway. The gateway, running on the host outside the sandbox, injects API keys before forwarding the request. If the sandbox is compromised, the keys are safe.

Microsoft followed with the Agent Governance Toolkit (April 2026), the first framework to address all 10 OWASP agentic AI risks with deterministic policy enforcement at sub-millisecond latency (<0.1ms p99). It implements execution rings, cryptographic identity via decentralized identifiers, and automated governance verification mapped to EU AI Act, HIPAA, and SOC 2.

NIST's AI Risk Management Framework (AI 600-1) provides the governance structure, noting that "LLMs are already able to discover vulnerabilities in systems and write code to exploit them." The framework structures risk management across four pillars: governance, mapping, measuring, and managing.

How do you isolate data between clients or business units?

Multi-tenant AI systems need architectural isolation, not just permission checks. Three models exist:

Silo
Separate database per tenant
Highest isolation
Highest cost
Regulated industries, government
Bridge
Separate schemas in shared database
Medium isolation
Medium cost
Enterprise SaaS
Pool
Row-level security in shared tables
Lowest isolation
Lowest cost
Early-stage, internal tools
Three multi-tenant isolation models. Higher isolation means higher cost, but stronger security guarantees.
ModelHow it worksIsolationCostBest for
SiloSeparate database per tenantHighestHighestRegulated industries, government
BridgeSeparate schemas in shared databaseMediumMediumEnterprise SaaS
PoolRow-level security in shared tablesLowestLowestEarly-stage, internal tools

Regardless of model, the principle is: no data path should exist between tenants. Separate embedding namespaces in vector stores. Domain-scoped agent sessions locked to a single tenant with no cross-domain memory. And output classification watermarking, where outputs inherit the classification of the highest-classified input they touched, preventing silent data laundering from confidential inputs to public outputs.

Research on Burn-After-Use (BAU) architectures, where ephemeral agent contexts are destroyed after each task, shows a 92% defense success rate against cross-session contamination.

43%
of tested MCP servers had command injection flaws, 30% allowed unrestricted URL fetching
Elastic Security Labs, September 2025
9
major MCP-specific breaches documented in 2025, including 437K+ installations affected
MCP Security Research, 2025
94.7%
of organizations use RBAC, but 73% fewer incidents when ABAC is added
Gartner, 2025
21%
of executives have complete visibility into agent permissions and data access patterns
Enterprise AI Security Survey
53%+
of organizations' AI agents operate without consistent security monitoring
AI Agent Security Report, 2025
92%
defense success rate from Burn-After-Use architecture against cross-context contamination
BAU Architecture Research
9.3
CVSS score: full admin impersonation via AI agent with only a target's email
AppOmni, January 2026

What the research says

"Any text being fed to the LLM has the potential to rewrite instructions on the client end."

Carolina Beretta, Gus Carlock, Andrew Pease, Elastic Security Labs, September 2025

"A Tool Poisoning Attack occurs when malicious instructions are embedded within MCP tool descriptions that are invisible to users but visible to AI models."

Luca Beurer-Kellner, Marc Fischer, Invariant Labs, April 2025

"Descriptions of tool behavior such as annotations should be considered untrusted, unless obtained from a trusted server."

Model Context Protocol Specification, Anthropic, November 2025

Our take

The pattern we see across enterprise deployments is a dangerous gap between how carefully organizations secure their traditional applications and how casually they deploy AI agents. A company that would never ship an API endpoint without authentication will deploy an agent with broad tool access and a system prompt that says "be careful with sensitive data."

What we've found works is a principle we call security mirroring: the agent's security posture should mirror the user's, enforced at infrastructure level. If a user can't access a file, their agent can't access it. Not because the prompt says so, but because the retrieval layer filters it out before the model ever sees it. Not because the agent is well-behaved, but because the sandbox physically prevents access.

The investment isn't in building a new security system. It's in making your existing one agent-aware. RBAC already defines who can do what. ABAC already handles contextual rules. The work is connecting those systems to the four enforcement points (gateway, retrieval, tool execution, output) so that agents inherit the security architecture you already built, rather than operating outside it.

Key takeaway

Don't build agent security from scratch. Mirror your existing enterprise permissions at the infrastructure level, enforce them at four points (prompt gateway, retrieval, tool execution, output), and sandbox agent execution so that a compromised agent physically can't reach what it shouldn't. The 2025 breach timeline proves that prompt-level instructions are not a security boundary.

FAQ

Can't I just tell the AI agent not to access sensitive data?

No. Prompt instructions are not a security boundary. Prompt injection attacks can override them, tool poisoning attacks can bypass them entirely, and there's no audit trail proving they were followed. In 2025, the first zero-click prompt injection (CVE-2025-32711) exfiltrated data from Microsoft 365 Copilot via a single email, no user interaction required.

What is tool poisoning?

A tool poisoning attack embeds malicious instructions in an MCP tool's description. The instructions are invisible to the user but visible to the AI model. Invariant Labs demonstrated this by creating a tool that appeared to perform simple addition but secretly directed the agent to exfiltrate SSH keys and config files.

How does kernel-level sandboxing work for AI agents?

Technologies like Linux Landlock, seccomp, and network namespaces restrict what an agent process can access at the OS level. The agent can only read files in its approved directory, make approved system calls, and reach approved network endpoints. Even if the agent is fully compromised by a prompt injection, it physically cannot access anything outside its sandbox.

Do I need to rebuild my permission system for AI agents?

No. The goal is to make your existing RBAC/ABAC system agent-aware by adding enforcement at four points: the prompt gateway, the retrieval layer, the tool execution layer, and the output layer. Your existing roles and policies stay the same. You're extending them to cover a new type of principal (the agent).

What about multi-tenant isolation?

Architectural isolation is required, not just permission checks. Separate embedding namespaces in vector stores, domain-scoped agent sessions, and output classification watermarking (outputs inherit the classification of the highest-classified input) prevent data leakage between tenants.

What compliance frameworks apply to AI agents?

NIST AI 600-1 provides the governance framework. SOC 2 Type II and GDPR are table stakes for enterprise deployment. The EU AI Act introduces additional requirements for high-risk AI systems. Microsoft's Agent Governance Toolkit (April 2026) is the first framework that maps automated governance checks to all three.

How do I know if my agents are secure right now?

If you can answer "yes" to all four: (1) agents can only access data their user can access, enforced at infrastructure level, (2) tool execution is sandboxed, (3) cross-tenant data paths don't exist architecturally, (4) you have audit trails for every agent action. If any answer is "no" or "we rely on prompt instructions," you have a gap.

Share this article