LawyerAILawyerAIIndependent Reviews
  • Search
  • Categories
  • Tag
  • Collection
  • Blog
  • Compare
  • Glossary
  • Solutions
  • Pricing
  • Submit
LawyerAILawyerAI
  1. Home
  2. ›
  3. Glossary
  4. ›
  5. Predictive Coding (eDiscovery)

Predictive Coding (eDiscovery)

A TAR technique where the system learns from attorney-coded seed documents to predict relevance across the full document set; court acceptance depends on validation methodology.

Last reviewed: 2026/05/19

Definition

Why It Matters for Lawyers

How AI Tools Handle It

Frequently Asked Questions

Q: How large does a seed set need to be for reliable predictive coding?
There is no universal standard. Seed sets of 500-2,000 documents are commonly used; more important than size is representativeness and coding consistency. A small, consistently coded, representative seed set outperforms a large, inconsistently coded one. Consult with your eDiscovery vendor on seed set design for your specific document population.
Q: What is the difference between predictive coding and active learning?
Traditional predictive coding trains the model on a fixed seed set before predicting across the document population. Active learning iteratively updates the model as reviewers code documents in production review, continuously improving predictions. Active learning typically achieves higher efficiency because the model improves throughout the review rather than only at the outset.
Q: Can I use predictive coding for privilege review?
Privilege review is a distinct legal determination that does not train well in predictive coding models. Privilege depends on legal standards, attorney-client relationship facts, and document-specific context that predictive coding models cannot reliably learn. Privilege review on documents identified as relevant through predictive coding should be conducted through attorney review. --- *Last reviewed: 2026-05-19 by LawyerAI Editorial Team.*

Related Concepts

Capability

Active Learning (eDiscovery)

An iterative ML approach in eDiscovery where the model continuously updates relevance predictions as reviewers code documents, prioritizing the most uncertain documents for review.

Related Tools

  • DISCO

    AI-native legal technology platform for eDiscovery, case building, and legal holds used by Am Law 200 firms.

  • Casepoint

    Cloud-based eDiscovery and legal hold platform with AI-powered document review for government and enterprise.

Related Reading

  • How We Score Legal AI Tools: The 5-Dimension Methodology
  • AI Hallucination in Legal Research: A Practitioner's Guide

Last reviewed: 2026/05/19. Definitions are written by the LawyerAI Editorial team. We do not accept affiliate commissions; Featured placement is clearly labeled and does not influence editorial content.

← All glossary terms
LawyerAILawyerAI

Independent Reviews

The independent directory of AI tools for lawyers — reviewed by methodology, not by ad budget.

X (Twitter)
Tools
  • Search
  • Categories
  • Tag
  • Collection
Resources
  • Blog
  • Compare
  • Glossary
  • Solutions
  • Pricing
  • Submit
  • Suggest a Tool
  • Newsletter
Company
  • About Us
  • Studio
Legal
  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • Refund Policy
  • Editorial Independence
  • Sitemap
Editorially independent. Methodology open and versioned.
© 2026LawyerAI Editorial

Predictive coding is a specific technology-assisted review methodology in which a supervised machine learning model is trained on a seed set of documents that attorneys have coded as relevant or non-relevant, then applies those relevance predictions to the remaining unreviewed document population. The model learns the characteristics of relevant documents from attorney-coded examples and ranks or classifies the full document set accordingly. Predictive coding is a form of TAR (Technology-Assisted Review) distinguished by its supervised learning approach — requiring a carefully selected and consistently coded seed set. Court acceptance of predictive coding as a discovery methodology depends on the transparency and rigor of the validation process.

Predictive coding emerged as a practical response to document volumes in large-scale litigation that made traditional linear review economically and logistically impossible. A case with five million documents cannot be reviewed linearly at any reasonable cost; predictive coding enables meaningful review of the relevant document population at a fraction of the cost.

The seed set selection and coding quality are the most critical inputs. A poorly selected or inconsistently coded seed set produces a poorly calibrated model that misclassifies documents. Attorneys responsible for predictive coding workflows must invest in seed set quality, not merely volume.

Validation methodology is the dimension on which courts most frequently scrutinize predictive coding implementations. The supervising attorney must understand recall and precision metrics, be able to explain the validation sampling process, and be prepared to defend the choice of completeness threshold if challenged.

Relativity supports multiple TAR implementations including traditional predictive coding with seed set review and its Active Learning module, which implements continuous active learning — a more efficient variant. DISCO offers predictive coding integrated with its review platform, with validation reporting built into the workflow.

CasePoint provides predictive coding with automated reporting on training progress, precision, and recall, supporting the documentation needed to defend the methodology in court or in meet-and-confer discussions with opposing counsel.