Autonomous AI: Protecting Against Rogue AI Agents
by
February 3, 2026
This Baker Botts blog leads off with the remarkable projection that “non-human and agentic identities are expected to exceed 45 billion by the end of 2026, more than twelve times the human global workforce.” Despite the staggering size of the AI agent workforce, the blog says that only 10% of organizations say they have a strategy for managing these agents. That gap between the speed with which they’re being deployed and the maturity of the governance structure for them presents both liability exposure and security risk. The blog identifies the following key risks that companies need to address:
Goal misalignment and instrumental harm. Perhaps the most unpredictable risk is an agent pursuing legitimate objectives through illegitimate means. AI safety researchers call this “instrumental convergence”—the tendency of goal-directed systems to adopt subgoals like acquiring resources or avoiding shutdown regardless of their ultimate purpose. Recent testing across major AI models found consistent misaligned behavior in high-stakes scenarios, with agents taking extreme actions to pursue their goals. Researchers have also observed “alignment faking”—AI systems strategically concealing their true objectives. The agent is not malicious; it is simply optimizing.
Prompt injection and manipulation. Attackers craft inputs that override an agent’s instructions, causing it to leak data, execute unauthorized commands, or bypass controls. Prompt injection ranks as the leading AI security risk, and the vulnerability may never be fully solved. Researchers have already demonstrated persistent attacks on AI memory systems and enterprise messaging platforms.
Credential compromise and privilege escalation.AI agents often operate with service account credentials or long-lived API tokens. Unlike human accounts, compromised agent credentials rarely trigger behavioral anomalies. Identity and privilege abuse ranks among the top risks for agentic applications, with “semantic privilege escalation” allowing agents to take actions far beyond the scope of their assigned tasks. Agents that integrate with multiple systems can chain actions to achieve aggregate privileges no single human user would possess.
Memory poisoning and data leakage. Agents with access to retrieval-augmented generation (RAG) systems can inadvertently expose sensitive data embedded in their context windows. Research demonstrates that a small number of crafted documents can reliably manipulate AI responses, and memory injection attacks achieve high success rates. Proprietary information becomes part of the agent’s reasoning process and may surface in responses or logs.
Cascading failures across chained systems. Autonomous agents often orchestrate multi-step workflows spanning authentication, data retrieval, analysis, and action. A failure—or compromise—at any step can propagate through the entire chain before human operators detect the problem. Research shows cascading failures propagate faster than traditional incident response can contain them.
These are not theoretical concerns. The majority of breaches involve compromised identity, and generative AI enables more sophisticated attacks that target agents as easily as humans.
The blog goes on to discuss key elements of a governance framework for AI agents, and notes that NIST is soliciting input on agentic AI considerations as part of the comment process for its draft framework. It also points out the key takeaway for in-house counsel is that AI agent governance can’t wait for regulatory clarity – the liability exposure and security risks exist now. Companies should look to proposed governance frameworks as starting points for incorporating their own governance systems into the deployment process.