[h] home [b] blog [n] notebook

hallucinations are the wrong thing to worry about

the number one thing journalists, politicians, and AI skeptics focus on when they talk about AI risk is hallucinations. the AI made something up. the AI cited a paper that doesn't exist. the AI got the date wrong. here is why this is mostly the wrong frame, and what the people who actually have to manage AI risk in production are worried about instead.

MIT Sloan recently published a working paper surveying senior executives — CTOs, CISOs, CROs, heads of AI at large organizations — on what AI risks concern them most. they rated six risk categories from 1 to 5. reliability (which includes hallucinations) scored lowest. average 3.68. 4% of respondents rated it "no concern at all." data and privacy scored highest: average 4.25. not a single respondent across the entire study marked data and privacy as "no concern."

not one person.

this is the opposite of the public discourse. the public discourse is obsessed with what AI says. the people responsible for actually deploying AI in enterprises are obsessed with what AI can see, touch, and leak.

why hallucinations are tractable

hallucinations are real. the Air Canada chatbot that told a passenger he could apply for a bereavement fare retroactively — then couldn't — was a real incident that cost the airline a small claims judgment and a lot of embarrassment. the lawyers who filed briefs citing AI-generated case citations that didn't exist lost cases and their credibility. these things happen and they matter.

but hallucinations have properties that make them manageable. they're often detectable. when an AI gives you information you can verify against external sources, the verification step catches many hallucinations. they're also somewhat predictable in distribution — models hallucinate more on obscure facts, on current events, on highly specific quantitative claims. you can design systems that route those categories to human review. and the field is improving fast; retrieval-augmented generation, where the model grounds its answers in retrieved documents rather than parametric memory, dramatically reduces hallucination rates on factual questions.

this doesn't mean hallucinations are solved. it means they're a tractable engineering problem with visible progress. you can measure them. you can reduce them. you can build systems that detect and hedge them.

why data leakage is not tractable in the same way

data leakage through AI systems is a different category of problem. when a model hallucinates, it invents something. when it leaks, it reveals something real. and the mechanisms of leakage are much harder to close.

consider retrieval-augmented generation, the same technology that reduces hallucinations. you connect the model to a knowledge base — internal documents, customer records, proprietary research — so it can look things up and give grounded answers. this works. it also means the model now has access to everything in the knowledge base, and a sufficiently clever question can extract things that shouldn't be exposed. access controls on individual documents don't cleanly translate to access controls on what the model can be induced to say about them.

context window leakage is similarly underappreciated. in multi-tenant AI deployments, information from one user's session needs to be strictly isolated from another's. the context window is finite; conversations get summarized; those summaries feed into future interactions. the failure modes here are subtle. you're not looking for a single dramatic breach. you're looking for information slowly seeping across session boundaries in ways that are hard to detect and harder to attribute.

prompt injection makes this worse. if you have an AI agent with access to your CRM and someone can inject instructions into data the agent reads, they can steer the agent toward revealing things it shouldn't. the agent isn't hallucinating; it's being directed. the output is real information, extracted on someone else's behalf.

what executives are actually afraid of

reading through the verbatim responses in that MIT survey, the mental model that keeps appearing isn't "the AI told someone the wrong thing." it's much closer to: an attacker gets in, takes what they want, and leaves no trace.

one financial sector CEO described their nightmare scenario as "infiltration, in and out in seconds and leaving no trace. and loss of control of operations." this is not the hallucination problem. this is the AI agent as unwitting accomplice problem — an agent with broad access that gets manipulated into exfiltrating data or taking actions in ways that look like normal agent activity from the logs.

another executive's stated concern was not knowing how many AI agentic tools are running inside their own tenant. they can't audit what they can't see. if an AI agent connected by one employee to the corporate Slack can be prompted to pull messages it shouldn't have access to and summarize them in a response, the CISO might never know it happened. the attack surface is defined by the union of everything every AI agent in the organization can access. nobody knows what that is.

the adversarial case is what separates the risks

hallucinations are accidental. they happen regardless of whether there's a malicious actor involved. they're a quality problem. you can improve quality over time by measuring it and investing in the fix.

data leakage and adversarial attacks are intentional. the attacker is trying to find the weakest point. they're not limited to documented attack categories — they're searching the space of possible attacks in real time, motivated by real payoff. this makes the risk profile fundamentally different: it doesn't decrease as the system improves, because the attacker also improves. you're in an arms race, not in a bug-fix cycle.

security professionals understand this distinction deeply. it's why security posture is about defense in depth and assumed breach, not about "we've patched all the known vulnerabilities so we're safe." the hallucination-focused AI discourse doesn't have this intuition, because it's coming from people who think about AI quality, not AI adversarial robustness.

this matters for what you build and how you test

if your primary AI risk concern is hallucinations, you invest in retrieval, grounding, confidence calibration, and output verification. reasonable. measurable. improvable.

if your primary AI risk concern is adversarial manipulation and data leakage, you invest in access control architecture, adversarial testing, input validation, session isolation, and anomaly detection on agent behavior. different skill set. harder to measure. can't be improved by a better model alone.

the gap between what the public discourse focuses on and what enterprise security professionals focus on matters because it shapes where investment goes. companies optimizing for "doesn't hallucinate" aren't necessarily building the properties that make an AI agent safe to deploy in high-stakes environments. the risk that actually breaks you might be the one you didn't think to test for.


the Air Canada case gets cited constantly because it's legible. an AI said a wrong thing, a person acted on it, there was harm, there was a judgment. the McDonald's AI hiring platform breach that exposed 64 million job applicants' records barely registers in AI risk discourse even though it's orders of magnitude larger in impact. one is a story about a dumb chatbot. the other is a story about AI systems with access they didn't need and no one thought to restrict. that's the harder problem, and it's the one getting less attention.