I don’t think people fully appreciate yet how agentic AI use cases are restricted by what Simon Willison coined the “lethal trifecta”. His article is a bit technical so I’ll try and break it down in more layman’s terms.
An AI agent becomes very high risk when these three things come together:
- Private data access – the agent can see sensitive information, like customer records, invoices, HR files or source code.
- Untrusted inputs – it also reads things you don’t control, like emails from customers, supplier documents, 3rd party/open source code or content on the web.
- The ability to communicate externally – it has a channel to send data out, like email, APIs or other external systems.
Each of those has risks on its own, but when you put all three together it creates a structural vulnerability we don’t yet know how to contain. That’s what makes the trifecta “lethal”. If someone wants to steal your data, you have no effective way to stop them. Malicious instructions hidden in untrusted inputs can trick the agent into exfiltrating (sending out) whatever private data it can see.
If you broaden that last point from “communicate externally” to “take external actions” (like sending payments, updating records in systems, or deploying code) the risk extends even further – not just leaking data, not just leaking data, but also doing harmful things like hijacking payments, corrupting information, or changing how systems behave.
It’s all a bit like leaving your car running with the keys in the ignition and hoping no one crashes it.
Where this matters most is in the types of “replace a worker” examples people get excited about. Think of:
- an AI finance assistant that reads invoices, checks supplier sites, and then pays them
- a customer support agent that reads emails, looks up answers on an online system and then issues refunds
- a DevOps helper that scans logs, searches the web for known vulnerabilities or issues, and then pushes config changes
All of those tick all three boxes – private data, untrusted input, and external actions – and that makes them unsafe right now.
There are safer uses, but they all involve breaking the loop – for example:
- our finance bot only drafts payments for human approval
- our support agent can suggest, but doesn’t issue refunds
- our DevOps helper only runs in a sandbox (a highly isolated environment)
Unless I have got this wrong, until we know how to contain the trifecta, the glossy vision of fully autonomous agents doesn’t look like something we can safely build.
And it may be that we never can. The problem isn’t LLM immaturity or missing features – it’s structural. LLMs can’t reliably tell malicious instructions from benign ones. To them, instructions are just text – there’s no mechanism to separate a genuine request from an attack hidden in the context. And because attackers can always invent new phrasings, the exploit surface is endless.
And if so, I wonder how long it will take before the penny drops on this.
Edit: I originally described the third element of Simon’s trifecta as “external actions”. I’ve updated this to align with Simon Willison’s original article, and instead expanded on the external actions point (partly after checking with Simon).