Predictive Coding (eDiscovery)

A TAR technique where the system learns from attorney-coded seed documents to predict relevance across the full document set; court acceptance depends on validation methodology.

Last reviewed: 2026/05/19

Definition

Why It Matters for Lawyers

How AI Tools Handle It

Frequently Asked Questions

Q: How large does a seed set need to be for reliable predictive coding?: There is no universal standard. Seed sets of 500-2,000 documents are commonly used; more important than size is representativeness and coding consistency. A small, consistently coded, representative seed set outperforms a large, inconsistently coded one. Consult with your eDiscovery vendor on seed set design for your specific document population.
Q: What is the difference between predictive coding and active learning?: Traditional predictive coding trains the model on a fixed seed set before predicting across the document population. Active learning iteratively updates the model as reviewers code documents in production review, continuously improving predictions. Active learning typically achieves higher efficiency because the model improves throughout the review rather than only at the outset.
Q: Can I use predictive coding for privilege review?: Privilege review is a distinct legal determination that does not train well in predictive coding models. Privilege depends on legal standards, attorney-client relationship facts, and document-specific context that predictive coding models cannot reliably learn. Privilege review on documents identified as relevant through predictive coding should be conducted through attorney review. --- *Last reviewed: 2026-05-19 by LawyerAI Editorial Team.*

Related Concepts

Capability

Active Learning (eDiscovery)

An iterative ML approach in eDiscovery where the model continuously updates relevance predictions as reviewers code documents, prioritizing the most uncertain documents for review.

Related Tools

DISCO
AI-native legal technology platform for eDiscovery, case building, and legal holds used by Am Law 200 firms.
Casepoint
Cloud-based eDiscovery and legal hold platform with AI-powered document review for government and enterprise.

Predictive Coding (eDiscovery)

Definition

Why It Matters for Lawyers

How AI Tools Handle It

Frequently Asked Questions

Related Concepts

Active Learning (eDiscovery)

Related Tools

Related Reading

Predictive Coding (eDiscovery)

Definition

Why It Matters for Lawyers

How AI Tools Handle It

Frequently Asked Questions

Related Concepts

Active Learning (eDiscovery)

Related Tools

Related Reading