the chatbot that testified

a UK employment tribunal ruled earlier this year that a screenshot of a ChatGPT conversation was admissible as evidence. the case involved a wrongful dismissal claim. the claimant's legal team had asked ChatGPT to describe general industry norms around a particular HR process and submitted the transcript as evidence of what "reasonable" practice looks like. the tribunal let it in.

judges made their displeasure very clear in the ruling. the tribunal noted that ChatGPT outputs can't be cross-examined, that the model might have been trained on data that reflects outdated norms, that the specific output might differ on a different day or with slightly different phrasing. they admitted it anyway, because existing evidence rules don't provide clear grounds for exclusion — and whatever their concerns about reliability, a ChatGPT transcript is still a document, and documents are admissible.

this might seem like a legal curio. it isn't. it's a preview of how AI outputs are going to interact with legal processes over the next decade, and the implications for liability are genuinely strange.

what it means when AI output has legal weight

the immediate issue isn't that this particular ruling went wrong. a screenshot describing general HR norms is low stakes. the output probably reflected something defensible about industry practice. but the ruling establishes the principle before the hard cases arrive, and the hard cases will be much harder.

imagine the same logic applied to a medical AI agent. a patient's care team uses an AI assistant to help interpret symptoms and suggest differential diagnoses. the AI's output is logged, as AI outputs in regulated industries increasingly are. that log is a document. in a subsequent malpractice case, one side wants to introduce the AI's recommendation as evidence that the doctor was — or wasn't — following what the AI identified as the appropriate protocol. the AI is now effectively testifying about what the right course of action was.

or imagine a financial AI agent that produces investment recommendations. the recommendations are logged. in an arbitration proceeding over a disputed trade, the logs are produced in discovery. the AI's output now says, with apparent authority, what it thought was the right thing to do at a specific time. it can't explain its reasoning. it can't be deposed. it can't be impeached with prior inconsistent statements. it just said what it said, and it's in the record.

the reliability problem and who bears it

there is a genuine reliability problem with using AI outputs as evidence, and the UK tribunal was right to note it. language models don't have stable opinions. the same query, run twice with slightly different phrasing, can produce meaningfully different outputs. the model's "view" on industry norms, best practices, or factual matters reflects its training data and the specific prompt — not a considered position that can be defended under cross-examination.

but here's the harder question: who bears the reliability risk when an AI output ends up in a legal proceeding?

the company that deployed the AI seems like the obvious answer. if you deployed an AI agent whose outputs could end up in court, you should have built something reliable enough that its outputs hold up. this is the vendor liability logic. but "reliable enough for legal proceedings" is a much higher bar than "reliable enough for a product recommendation" or even "reliable enough for a medical decision support tool." it requires a kind of epistemic traceability — the ability to explain where a conclusion came from, why it was reached, and how stable it is — that most AI systems fundamentally don't have.

the company whose employees used the AI also bears some responsibility. the HR manager who submitted a ChatGPT screenshot as evidence made a choice about what to put before the tribunal. the doctor whose care notes reference an AI recommendation made a choice to document that reference. at some point, the human who decided to rely on the AI output and preserve it in a context where it might have legal consequences is making a decision the AI can't make for itself.

the insurance structure this implies

if AI outputs can end up as evidence in legal proceedings, then the companies deploying AI agents have a new category of exposure: liability from how their AI's logged outputs perform under legal scrutiny. this is different from and in addition to the direct liability from the AI giving bad advice. it's the liability from having a record of the AI giving advice that turns out to be used against you.

this kind of exposure is closest to professional liability — the risk that documented work product turns out to be wrong or inadequate in a way that creates claims. professional liability insurance for humans is priced partly based on what kind of work the professional does and what the record-keeping standards are. a doctor's professional liability premium reflects, among other things, the quality of their documentation practices. better documentation means better defensibility, which means lower expected losses.

the same logic should apply to AI agents. an AI agent with comprehensive, auditable logs — including not just outputs but the inputs, the model version, the system prompt, and the confidence calibration — is in a better position when its outputs end up in a legal proceeding than one where you have the output but not the context. insurers should price AI liability coverage based partly on what the company can actually produce in discovery: how good is the audit trail, how stable are the outputs over repeated queries on the same question, what was the model trained on.

the deeper issue

what the UK ruling really exposes is that we're deploying AI systems that produce outputs with real-world consequences before we've developed legal frameworks for treating those outputs appropriately. the AI gives advice that people rely on, those people take actions based on that advice, those actions have outcomes, those outcomes sometimes end up in litigation. at that point the AI's original output becomes relevant, and we have no good answer to the question of what it actually means.

is an AI output more like an expert witness's opinion? a reference book? a calculation by a tool? a statement by an employee? each of these categories implies different evidentiary treatment, different standards of admissibility, different ways of weighing it against other evidence. we're going to be working out the answer case by case, in jurisdictions around the world, over the next decade. the companies whose AI outputs end up in those cases are going to be paying the legal bills for that process.

the interesting question is whether the early cases will settle quickly enough that we don't get much case law, or whether some of them will go to verdict in a way that creates binding precedents. the Air Canada case settled relatively cheaply and didn't generate much law. but as the stakes increase — AI outputs in employment discrimination cases, malpractice cases, securities cases — the incentive to fight rather than settle will increase, and the resulting precedents will matter a lot.

the companies building AI agents for high-stakes professional contexts — legal, medical, financial — should be watching these cases closely. not just for compliance reasons, but because the emerging case law is going to define what "adequate AI liability coverage" needs to include. the UK employment tribunal ruling is a small case. the framework it's building around is not small at all.