Continuous Active Learning (CAL) is an eDiscovery review methodology in which a machine-learning model continuously updates its document relevance predictions as reviewers code each document, immediately re-prioritizing the review queue to surface the most likely-relevant documents next. Unlike earlier predictive coding approaches, CAL has no defined training phase and no frozen model — every reviewer decision is a new training signal that reshapes the ranked document list in real time.
CAL emerged as a practical response to a structural weakness in first-generation Technology Assisted Review (TAR 1.0): the requirement to train the model on a fixed "seed set" before deployment meant that the model reflected the assumptions baked into that initial batch, with no mechanism to correct course as review progressed. CAL eliminates that constraint. The model learns from the entire body of reviewer decisions accumulated since the review began, not just a pre-selected subset, and it never stops learning until the review is closed.
Document review is the single largest cost driver in civil litigation. Studies conducted by the RAND Institute for Civil Justice estimate that attorneys spend between 50 and 70 percent of total litigation budgets reviewing documents, at per-document rates ranging from $1 to $5 for contract reviewers and $10 to $50 or more for attorney-level review on sensitive privilege determinations. In a large commercial dispute generating two million documents, that arithmetic produces review bills that can exceed the value of the underlying dispute.
CAL attacks this problem directly. Research by Maura Grossman and Gordon Cormack, whose work was cited approvingly by the court in Da Silva Moore v. Publicis Groupe (S.D.N.Y. 2012) — the first federal decision to approve predictive coding for discovery — demonstrated that well-implemented TAR and CAL processes consistently achieve recall rates above 75 percent while reviewing only 20 to 40 percent of the document collection. Follow-on studies through 2024 show CAL specifically achieving 90 percent recall at approximately 30 percent review effort in standard binary-relevance reviews. That 70 percent reduction in documents reviewed translates directly to budget.
Beyond cost, CAL affects completeness obligations. A producing party that certifies a document production under FRCP Rule 26(g) is certifying that the response is complete and correct to the best of the attorney's knowledge. CAL's continuous recall metrics — visible in real time in platforms like Relativity Active Learning — give counsel a quantitative basis for making that certification rather than relying on sampling estimates alone.
Courts have progressively accepted CAL as meeting the proportionality standard codified in FRCP Rule 26(b)(1), which requires discovery to be proportional to the needs of the case. In DA Brown, Inc. v. Tellermate Holdings (S.D. Ohio 2014), the court specifically validated a continuous active learning protocol and noted that parties need not use the most expensive review method when a proportionate alternative exists. By 2025, federal magistrate judges in technology-heavy jurisdictions routinely expect parties to explain why they are not using some form of machine-assisted review in large document productions.
How It Works (Technical)
Think of CAL as a feedback loop between human reviewers and a ranking algorithm. At the start of a review, the platform has no basis for ranking documents other than metadata or keyword search results. A reviewer codes the first document — relevant or not relevant — and that single coding event teaches the model something: these words, these custodians, this date range, these email threads correlate with the human's judgment of relevance.
The model immediately re-ranks all uncoded documents by predicted relevance probability. The next document presented to the reviewer is now the one the model estimates is most likely relevant, not a random draw from the collection. The reviewer codes it. The model updates again. This cycle repeats for every document coded by every reviewer working concurrently on the matter.
The mathematical backbone is typically a variant of logistic regression or a gradient-boosted classifier trained on document feature vectors — word frequencies, metadata fields, thread structure, custodian identity. More recent platforms use transformer-based embeddings to capture semantic similarity beyond exact keyword matching. What distinguishes CAL from a one-time training run is that the model's weights are updated continuously, not once at the end of a training phase.
Recall vs. Precision — the metrics lawyers must understand. Recall measures what fraction of all genuinely relevant documents were found. Precision measures what fraction of the documents coded as relevant actually are relevant. These trade off against each other. A strategy that codes every document as relevant has 100 percent recall and near-zero precision. CAL optimizes for recall because producing parties are obligated to find relevant documents, not to avoid false positives (which are simply coded out during review). Courts and counsel should agree on a target recall rate — typically 75 to 85 percent is defensible in most federal courts; 90 percent or above is appropriate for matters where adverse inferences for incomplete production are a serious risk.
CAL reaches a stopping point when the model's marginal yield — the fraction of newly reviewed documents that are actually relevant — drops below a predetermined threshold, typically two to five percent. At that point, reviewing additional documents produces diminishing returns, and the platform can generate a statistical certification of completeness.
How Legal AI Vendors Address It
Relativity Active Learning is the dominant CAL implementation in large-matter eDiscovery. It integrates directly with the Relativity review platform's full ecosystem: tagging, privilege logging, batch coding, QC workflows, and production. The platform provides real-time recall estimation graphs and detailed project reports suitable for inclusion in discovery meet-and-confer correspondence. Relativity Active Learning requires a Relativity Certified Administrator to configure, and setup costs for complex review protocols — multiple issue tags, privilege screening, foreign-language document handling — can be substantial. It is not a self-service tool.
Everlaw offers cloud-native CAL with a faster onboarding path than Relativity for matters under one million documents. Its collaborative review interface and real-time analytics dashboard make it practical for litigation teams that lack a dedicated eDiscovery project manager. The limitation is configurability: Everlaw's CAL implementation is less granular than Relativity's for complex multi-issue reviews requiring separate models per issue or for matters with non-standard document types such as databases or structured data.
Logikcull simplifies CAL for mid-market commercial litigation. The platform is designed for attorneys who need AI-assisted review without engaging an eDiscovery vendor. Control over model parameters is limited — attorneys cannot tune the stopping criterion or inspect feature weights — which makes Logikcull less appropriate for bet-the-company matters where the opposing party may challenge the review protocol.
Casepoint has particular strength in government-sector and FOIA review use cases, where document volumes are large, relevance standards are statutory rather than case-specific, and cost controls are a procurement requirement. Its CAL implementation is certified for several federal agency workflows. The platform's commercial litigation feature set lags Relativity and Everlaw in some advanced analytics functions.
Reveal AI is a newer entrant that has invested in model transparency features — reviewers can inspect which document features most influenced a relevance prediction, which is useful when responding to opposing counsel challenges to the review protocol. The platform's market share is smaller than the incumbents, which means fewer experienced project managers in the eDiscovery services ecosystem have hands-on Reveal experience.
How Lawyers Should Verify and Apply CAL
-
Define relevance before review begins. CAL learns from reviewer decisions. If different reviewers apply different relevance standards, the model learns a blended standard that reflects no one's actual intent. Draft a written relevance protocol — one to two pages, specific to the matter — before the first document is coded. This document also serves as evidence that the review was conducted in good faith if challenged.
-
Agree on a recall target and document it. Before issuing or receiving productions, record the agreed recall target in the discovery protocol or a stipulated order. Courts in the Southern District of New York and Northern District of California have published model ESI protocols that include recall target provisions. A written agreement eliminates disputes at the completion-certification stage.
-
Monitor the yield curve, not just the document count. Platforms display a declining yield curve as CAL progresses. Do not stop review simply because a fixed number of documents have been reviewed. Stop when the marginal yield drops to the agreed threshold and the recall estimate crosses the agreed target with a confidence interval that satisfies the matter's risk profile.
-
Run a validation sample before certifying production. After CAL stops, draw a random sample of 500 to 1,000 documents from the not-relevant population and have a senior reviewer code them independently. If the elusion rate — the fraction of genuinely relevant documents in the not-relevant set — is within the agreed tolerance, the production can be certified. Document the validation methodology and results in a memo to file.
-
For multi-issue reviews, run separate CAL models per issue. A document that is relevant to claim A but not to claim B will confuse a single CAL model. Platforms that support issue-level CAL should be configured with a separate active learning project for each distinct relevance issue. This increases setup complexity but produces far more defensible results in cases where issue-specific production is required.