Predictive Coding (eDiscovery)
A TAR technique where the system learns from attorney-coded seed documents to predict relevance across the full document set; court acceptance depends on validation methodology.
Last reviewed: 2026/05/19
Definition
Why It Matters for Lawyers
How AI Tools Handle It
Frequently Asked Questions
- Q: How large does a seed set need to be for reliable predictive coding?
- There is no universal standard. Seed sets of 500-2,000 documents are commonly used; more important than size is representativeness and coding consistency. A small, consistently coded, representative seed set outperforms a large, inconsistently coded one. Consult with your eDiscovery vendor on seed set design for your specific document population.
- Q: What is the difference between predictive coding and active learning?
- Traditional predictive coding trains the model on a fixed seed set before predicting across the document population. Active learning iteratively updates the model as reviewers code documents in production review, continuously improving predictions. Active learning typically achieves higher efficiency because the model improves throughout the review rather than only at the outset.
- Q: Can I use predictive coding for privilege review?
- Privilege review is a distinct legal determination that does not train well in predictive coding models. Privilege depends on legal standards, attorney-client relationship facts, and document-specific context that predictive coding models cannot reliably learn. Privilege review on documents identified as relevant through predictive coding should be conducted through attorney review. --- *Last reviewed: 2026-05-19 by LawyerAI Editorial Team.*
Related Concepts
Related Tools
Related Reading
Last reviewed: 2026/05/19. Definitions are written by the LawyerAI Editorial team. We do not accept affiliate commissions; Featured placement is clearly labeled and does not influence editorial content.