If you've evaluated legal AI tools over the past year or two, you've probably encountered the same architecture dressed up in different branding. Take the client's documents, split them into chunks of a few hundred words, convert those chunks into vector embeddings, and retrieve the closest matches when someone asks a question. This is retrieval-augmented generation — RAG — and it has become the default approach for almost every AI product in the legal market.

RAG solves a real problem. It keeps large language models anchored to source material, which reduces hallucination and makes outputs more trustworthy. For legal research, knowledge management, and contract review, the approach holds up reasonably well. But when you try to apply it to active litigation — to the messy, interconnected, procedurally layered reality of a complex dispute — you start running into limitations that no amount of prompt engineering can fix.

Fragmentation Destroys Context

RAG systems fragment everything. A 400-page expert report becomes dozens of isolated text chunks, each stripped of its position in the narrative and its relationship to the exhibits and appendices it references. A set of pleadings — where paragraph 47 of the Defence responds to paragraph 23 of the Particulars of Claim, which itself cross-refers to a letter sent eight months before the claim was issued — gets shredded into pieces that have no awareness of each other.

A junior lawyer reading into a new case builds a mental architecture as they go, connecting an admission in one document to a denial in another, understanding how the chronology shapes the story the parties are telling. Standard RAG has no mechanism for this kind of structural reasoning, because it was never designed for documents that talk to each other across a procedural timeline.

Semantic Similarity Is a Blunt Instrument

Litigation turns on which facts are relevant, and that depends entirely on the issues in dispute, the positions each side has adopted, and where you are in the case's lifecycle. When a litigator asks about a payment made in March 2019, they care about it because it fell three days after a contractual longstop date that the claimant says was extended orally and the defendant says was not. The relevance lives in the relationship between documents, in the procedural context, in the legal argument being constructed around the facts.

A retrieval system that fetches chunks based on cosine similarity will surface passages mentioning payments, or March, or 2019 — which may or may not be the ones that count. It has no representation of the dispute's issues, no model of each party's case, and no way to reason about why one document carries more weight than another in light of everything else in the file.

Error Tolerance in Litigation Is Essentially Zero

In a research context, you can live with imperfect retrieval. You skim the output, check the citations, fill in the gaps yourself. In active litigation, where you are preparing a chronology for counsel or verifying whether the other side admitted late delivery in their 4 June letter, you need the kind of precision that most AI architectures were simply never built to deliver.

Standard RAG introduces compounding failure points: retrieval might miss relevant chunks because the language doesn't overlap enough, generation might blend information from multiple sources in subtly misleading ways, and as the context window fills up the model loses track of which source said what. At the scale of a real dispute — thousands of documents, months of correspondence, multiple procedural applications — even a modest error rate means dozens of inaccuracies woven through your work product, each one a potential embarrassment or worse.

What We Built Instead

Designing for litigation required us to rethink the architecture from first principles. Documents in our system are understood as structured artefacts — a Reply to Defence that responds to a Defence that responds to Particulars of Claim, each with internal paragraph numbering and cross-references that the system maps automatically. Witness statements remain connected to their exhibits. Correspondence chains maintain their threading and temporal context.

We maintain a structured representation of each party's position on each disputed issue, which means that when a lawyer asks a question, the answer reflects the full procedural landscape rather than whatever fragments happened to score highest on a similarity metric. Retrieval techniques still play a role — they are one component of a much larger system — but they operate within a framework that understands what litigation looks like.

Picking Tools That Match the Work

The legal AI market is maturing quickly, and firms evaluating platforms should be asking architectural questions alongside the usual security and compliance checklist. How does the system handle cross-references between documents? Can it reason across the full case file or only over retrieved snippets? Does it maintain awareness of procedural context and party positions, or does every query start from scratch?

These questions are worth asking because disputes work is structurally complex in ways that general-purpose retrieval was never designed to handle. The right architecture makes the difference between a tool that occasionally helps and one your team relies on daily.

If you'd like to see how Crimson's purpose-built approach compares to general-purpose AI on your own case files, request a demo and we'll walk you through it.

Why RAG Alone Isn't Enough for Complex Litigation

Fragmentation Destroys Context

Semantic Similarity Is a Blunt Instrument

Error Tolerance in Litigation Is Essentially Zero

What We Built Instead

Picking Tools That Match the Work

Javier Pascual

Related Articles

The 2026 ICC Rules: What Changes for Practitioners

AI-Assisted Witness Statement Preparation: Best Practices and Ethical Boundaries

Ready to transform your litigation practice?