LawyerAILawyerAIIndependent Reviews
  • Search
  • Categories
  • Tag
  • Collection
  • Blog
  • Compare
  • Glossary
  • Solutions
  • Pricing
  • Submit
LawyerAILawyerAI
  1. Home
  2. ›
  3. Glossary
  4. ›
  5. Continuous Active Learning (CAL)

Continuous Active Learning (CAL)

An eDiscovery review method where the AI updates its relevance predictions after every reviewer decision, continuously prioritizing the most likely-relevant documents.

Last reviewed: 2026/05/22

Definition

Why It Matters for Lawyers

How AI Tools Handle It

Frequently Asked Questions

Is CAL the same as predictive coding?
Predictive coding is the broader category; CAL is one implementation. The older approach (TAR 1.0 or "predictive coding" as originally used in court filings) involves training on a fixed seed set and then freezing the model. CAL updates continuously throughout review. Most platforms marketed as "TAR" today use some form of continuous learning, but the specific implementation differs by vendor. When evaluating a platform, ask explicitly whether the model updates after every coding decision or only in batches.
Can opposing counsel challenge a CAL-based production?
Yes, and courts have entertained such challenges. The most common attacks are: the relevance standard applied was too narrow; the recall target was insufficient; the review population excluded key custodians or data sources. The defenses are documentation — a written relevance protocol, an agreed recall target, a logged validation sample, and transparent project reports from the platform. Courts have consistently upheld properly documented CAL productions and sanctioned parties that challenged CAL without independent evidence of deficiency.
What happens if key documents are coded incorrectly early in the review?
Early miscodes affect model accuracy, particularly if they occur before enough correct decisions have been accumulated to dilute their influence. Most platforms allow QC reviewers to override earlier codings, and the model will update accordingly. The practical mitigation is to assign senior reviewers — not contract reviewers — to the first 500 to 1,000 documents, since those early decisions have disproportionate influence on the model's initial direction.

Related Concepts

Legal Practice

eDiscovery (Electronic Discovery)

The process of identifying, preserving, collecting, processing, reviewing, and producing electronically stored information in litigation or regulatory investigations under FRCP and equivalent rules.

Tech / Model

TAR vs. CAL in eDiscovery

Two AI-assisted document review approaches in eDiscovery: TAR 1.0 uses a frozen trained model; CAL continuously updates as reviewers code documents.

Related Tools

  • Relativity

    The industry-standard e-discovery platform for processing, reviewing, and analyzing large-scale document collections.

  • Everlaw

    Cloud eDiscovery with AI predictive coding and document summarization.

  • Logikcull

    Self-service eDiscovery platform designed for instant setup, used by solo firms through Fortune 500 legal teams.

  • Casepoint

    Cloud-based eDiscovery and legal hold platform with AI-powered document review for government and enterprise.

  • Reveal AI

    AI-powered eDiscovery platform with active learning, NLP analytics, and integrated review for complex litigation.

Related Reading

  • The Complete Guide to AI in Litigation and eDiscovery (2026)

Last reviewed: 2026/05/22. Definitions are written by the LawyerAI Editorial team. We do not accept affiliate commissions; Featured placement is clearly labeled and does not influence editorial content.

← All glossary terms
LawyerAILawyerAI

Independent Reviews

The independent directory of AI tools for lawyers — reviewed by methodology, not by ad budget.

X (Twitter)
Tools
  • Search
  • Categories
  • Tag
  • Collection
Resources
  • Blog
  • Compare
  • Glossary
  • Solutions
  • Pricing
  • Submit
  • Suggest a Tool
  • Newsletter
Company
  • About Us
  • Studio
Legal
  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • Refund Policy
  • Editorial Independence
  • Sitemap
Editorially independent. Methodology open and versioned.
© 2026LawyerAI Editorial

Continuous Active Learning (CAL) is an eDiscovery review methodology in which a machine-learning model continuously updates its document relevance predictions as reviewers code each document, immediately re-prioritizing the review queue to surface the most likely-relevant documents next. Unlike earlier predictive coding approaches, CAL has no defined training phase and no frozen model — every reviewer decision is a new training signal that reshapes the ranked document list in real time.

CAL emerged as a practical response to a structural weakness in first-generation Technology Assisted Review (TAR 1.0): the requirement to train the model on a fixed "seed set" before deployment meant that the model reflected the assumptions baked into that initial batch, with no mechanism to correct course as review progressed. CAL eliminates that constraint. The model learns from the entire body of reviewer decisions accumulated since the review began, not just a pre-selected subset, and it never stops learning until the review is closed.

Document review is the single largest cost driver in civil litigation. Studies conducted by the RAND Institute for Civil Justice estimate that attorneys spend between 50 and 70 percent of total litigation budgets reviewing documents, at per-document rates ranging from $1 to $5 for contract reviewers and $10 to $50 or more for attorney-level review on sensitive privilege determinations. In a large commercial dispute generating two million documents, that arithmetic produces review bills that can exceed the value of the underlying dispute.

CAL attacks this problem directly. Research by Maura Grossman and Gordon Cormack, whose work was cited approvingly by the court in Da Silva Moore v. Publicis Groupe (S.D.N.Y. 2012) — the first federal decision to approve predictive coding for discovery — demonstrated that well-implemented TAR and CAL processes consistently achieve recall rates above 75 percent while reviewing only 20 to 40 percent of the document collection. Follow-on studies through 2024 show CAL specifically achieving 90 percent recall at approximately 30 percent review effort in standard binary-relevance reviews. That 70 percent reduction in documents reviewed translates directly to budget.

Beyond cost, CAL affects completeness obligations. A producing party that certifies a document production under FRCP Rule 26(g) is certifying that the response is complete and correct to the best of the attorney's knowledge. CAL's continuous recall metrics — visible in real time in platforms like Relativity Active Learning — give counsel a quantitative basis for making that certification rather than relying on sampling estimates alone.

Courts have progressively accepted CAL as meeting the proportionality standard codified in FRCP Rule 26(b)(1), which requires discovery to be proportional to the needs of the case. In DA Brown, Inc. v. Tellermate Holdings (S.D. Ohio 2014), the court specifically validated a continuous active learning protocol and noted that parties need not use the most expensive review method when a proportionate alternative exists. By 2025, federal magistrate judges in technology-heavy jurisdictions routinely expect parties to explain why they are not using some form of machine-assisted review in large document productions.

How It Works (Technical)

Think of CAL as a feedback loop between human reviewers and a ranking algorithm. At the start of a review, the platform has no basis for ranking documents other than metadata or keyword search results. A reviewer codes the first document — relevant or not relevant — and that single coding event teaches the model something: these words, these custodians, this date range, these email threads correlate with the human's judgment of relevance.

The model immediately re-ranks all uncoded documents by predicted relevance probability. The next document presented to the reviewer is now the one the model estimates is most likely relevant, not a random draw from the collection. The reviewer codes it. The model updates again. This cycle repeats for every document coded by every reviewer working concurrently on the matter.

The mathematical backbone is typically a variant of logistic regression or a gradient-boosted classifier trained on document feature vectors — word frequencies, metadata fields, thread structure, custodian identity. More recent platforms use transformer-based embeddings to capture semantic similarity beyond exact keyword matching. What distinguishes CAL from a one-time training run is that the model's weights are updated continuously, not once at the end of a training phase.

Recall vs. Precision — the metrics lawyers must understand. Recall measures what fraction of all genuinely relevant documents were found. Precision measures what fraction of the documents coded as relevant actually are relevant. These trade off against each other. A strategy that codes every document as relevant has 100 percent recall and near-zero precision. CAL optimizes for recall because producing parties are obligated to find relevant documents, not to avoid false positives (which are simply coded out during review). Courts and counsel should agree on a target recall rate — typically 75 to 85 percent is defensible in most federal courts; 90 percent or above is appropriate for matters where adverse inferences for incomplete production are a serious risk.

CAL reaches a stopping point when the model's marginal yield — the fraction of newly reviewed documents that are actually relevant — drops below a predetermined threshold, typically two to five percent. At that point, reviewing additional documents produces diminishing returns, and the platform can generate a statistical certification of completeness.

How Legal AI Vendors Address It

Relativity Active Learning is the dominant CAL implementation in large-matter eDiscovery. It integrates directly with the Relativity review platform's full ecosystem: tagging, privilege logging, batch coding, QC workflows, and production. The platform provides real-time recall estimation graphs and detailed project reports suitable for inclusion in discovery meet-and-confer correspondence. Relativity Active Learning requires a Relativity Certified Administrator to configure, and setup costs for complex review protocols — multiple issue tags, privilege screening, foreign-language document handling — can be substantial. It is not a self-service tool.

Everlaw offers cloud-native CAL with a faster onboarding path than Relativity for matters under one million documents. Its collaborative review interface and real-time analytics dashboard make it practical for litigation teams that lack a dedicated eDiscovery project manager. The limitation is configurability: Everlaw's CAL implementation is less granular than Relativity's for complex multi-issue reviews requiring separate models per issue or for matters with non-standard document types such as databases or structured data.

Logikcull simplifies CAL for mid-market commercial litigation. The platform is designed for attorneys who need AI-assisted review without engaging an eDiscovery vendor. Control over model parameters is limited — attorneys cannot tune the stopping criterion or inspect feature weights — which makes Logikcull less appropriate for bet-the-company matters where the opposing party may challenge the review protocol.

Casepoint has particular strength in government-sector and FOIA review use cases, where document volumes are large, relevance standards are statutory rather than case-specific, and cost controls are a procurement requirement. Its CAL implementation is certified for several federal agency workflows. The platform's commercial litigation feature set lags Relativity and Everlaw in some advanced analytics functions.

Reveal AI is a newer entrant that has invested in model transparency features — reviewers can inspect which document features most influenced a relevance prediction, which is useful when responding to opposing counsel challenges to the review protocol. The platform's market share is smaller than the incumbents, which means fewer experienced project managers in the eDiscovery services ecosystem have hands-on Reveal experience.

How Lawyers Should Verify and Apply CAL

  1. Define relevance before review begins. CAL learns from reviewer decisions. If different reviewers apply different relevance standards, the model learns a blended standard that reflects no one's actual intent. Draft a written relevance protocol — one to two pages, specific to the matter — before the first document is coded. This document also serves as evidence that the review was conducted in good faith if challenged.

  2. Agree on a recall target and document it. Before issuing or receiving productions, record the agreed recall target in the discovery protocol or a stipulated order. Courts in the Southern District of New York and Northern District of California have published model ESI protocols that include recall target provisions. A written agreement eliminates disputes at the completion-certification stage.

  3. Monitor the yield curve, not just the document count. Platforms display a declining yield curve as CAL progresses. Do not stop review simply because a fixed number of documents have been reviewed. Stop when the marginal yield drops to the agreed threshold and the recall estimate crosses the agreed target with a confidence interval that satisfies the matter's risk profile.

  4. Run a validation sample before certifying production. After CAL stops, draw a random sample of 500 to 1,000 documents from the not-relevant population and have a senior reviewer code them independently. If the elusion rate — the fraction of genuinely relevant documents in the not-relevant set — is within the agreed tolerance, the production can be certified. Document the validation methodology and results in a memo to file.

  5. For multi-issue reviews, run separate CAL models per issue. A document that is relevant to claim A but not to claim B will confuse a single CAL model. Platforms that support issue-level CAL should be configured with a separate active learning project for each distinct relevance issue. This increases setup complexity but produces far more defensible results in cases where issue-specific production is required.