Q1: How do I find out what a legal AI was trained on?

Ask the vendor directly. Some vendors publish model cards or technical documentation describing training data composition. Others treat this as proprietary. At minimum, ask about: the training data cutoff date, whether client-submitted documents are used for training, and whether the training corpus includes the specific practice area and jurisdiction relevant to your work.

Q2: Can a model be retrained to include new case law?

Retraining an entire foundation model is extremely resource-intensive. Most vendors address the knowledge cutoff problem through RAG (retrieving current legal content at query time) rather than continuous retraining. Fine-tuning on new data is more feasible for specific capability improvements but is still a significant engineering undertaking.

Q3: If a tool was trained on our firm's documents without permission, what are the legal implications?

Training a model on copyright-protected text without authorization may infringe copyright. Training on confidential client documents could breach confidentiality obligations. Lawyers should confirm in vendor agreements that client-submitted content is not used for model training. If a vendor trained on improperly obtained legal documents, it may also affect the reliability and legal standing of outputs from that model. --- *Last reviewed: 2026-05-19 by LawyerAI Editorial Team.*

Training Data

Training data is the corpus of text and examples used to train a large language model, establishing its capabilities, knowledge, and limitations; the quality, recency, and composition of training data directly affects the model's reliability for legal tasks.

Last reviewed: 2026/05/19

Definition

Why It Matters for Lawyers

How AI Tools Handle It

Frequently Asked Questions

Q1: How do I find out what a legal AI was trained on?: Ask the vendor directly. Some vendors publish model cards or technical documentation describing training data composition. Others treat this as proprietary. At minimum, ask about: the training data cutoff date, whether client-submitted documents are used for training, and whether the training corpus includes the specific practice area and jurisdiction relevant to your work.
Q2: Can a model be retrained to include new case law?: Retraining an entire foundation model is extremely resource-intensive. Most vendors address the knowledge cutoff problem through RAG (retrieving current legal content at query time) rather than continuous retraining. Fine-tuning on new data is more feasible for specific capability improvements but is still a significant engineering undertaking.
Q3: If a tool was trained on our firm's documents without permission, what are the legal implications?: Training a model on copyright-protected text without authorization may infringe copyright. Training on confidential client documents could breach confidentiality obligations. Lawyers should confirm in vendor agreements that client-submitted content is not used for model training. If a vendor trained on improperly obtained legal documents, it may also affect the reliability and legal standing of outputs from that model. --- *Last reviewed: 2026-05-19 by LawyerAI Editorial Team.*

Related Concepts

Tech / Model

LLM (Large Language Model)

A large language model (LLM) is an AI system trained on large volumes of text data to predict and generate human-like text; it serves as the core engine underlying most legal AI tools for research, drafting, and document analysis.

Tech / Model

Fine-tuning

Fine-tuning is the process of further training a pre-trained large language model on a domain-specific dataset to improve its performance on tasks in that domain, such as legal document analysis, contract drafting, or jurisdiction-specific research.

Tech / Model

Model Card (AI Transparency)

A structured disclosure document that describes an AI model's intended uses, performance metrics, training data, and known limitations for informed evaluation.

Tech / Model

AI Bias (Legal Context)

AI bias in legal contexts refers to systematic errors or disparate outcomes in AI model outputs caused by imbalances in training data, model design, or task framing — potentially producing results that disadvantage certain parties, jurisdictions, or case types.

Related Tools

Westlaw Precision AI
AI-powered legal research with citation-validated answers from Westlaw.
Lexis+ AI
Conversational legal research with real-time Shepard's citation validation.
Kira Systems
AI clause extraction and due diligence trusted by AmLaw 100 firms.
Luminance
Enterprise AI for portfolio-level contract analysis and institutional memory.
Harvey AI
The most expensive legal AI in the market — Am Law 100 firms only.

Related Comparisons

Kira Systems vs Luminance: Enterprise Contract Analysis Compared

Training Data

Last reviewed: 2026/05/19

Definition

Why It Matters for Lawyers

How AI Tools Handle It

Frequently Asked Questions

Q1: How do I find out what a legal AI was trained on?: Ask the vendor directly. Some vendors publish model cards or technical documentation describing training data composition. Others treat this as proprietary. At minimum, ask about: the training data cutoff date, whether client-submitted documents are used for training, and whether the training corpus includes the specific practice area and jurisdiction relevant to your work.
Q2: Can a model be retrained to include new case law?: Retraining an entire foundation model is extremely resource-intensive. Most vendors address the knowledge cutoff problem through RAG (retrieving current legal content at query time) rather than continuous retraining. Fine-tuning on new data is more feasible for specific capability improvements but is still a significant engineering undertaking.
Q3: If a tool was trained on our firm's documents without permission, what are the legal implications?: Training a model on copyright-protected text without authorization may infringe copyright. Training on confidential client documents could breach confidentiality obligations. Lawyers should confirm in vendor agreements that client-submitted content is not used for model training. If a vendor trained on improperly obtained legal documents, it may also affect the reliability and legal standing of outputs from that model. --- *Last reviewed: 2026-05-19 by LawyerAI Editorial Team.*

Related Concepts

Tech / Model

Related Tools

Westlaw Precision AI
AI-powered legal research with citation-validated answers from Westlaw.
Lexis+ AI
Conversational legal research with real-time Shepard's citation validation.
Kira Systems
AI clause extraction and due diligence trusted by AmLaw 100 firms.
Luminance
Enterprise AI for portfolio-level contract analysis and institutional memory.
Harvey AI
The most expensive legal AI in the market — Am Law 100 firms only.

Related Comparisons

Kira Systems vs Luminance: Enterprise Contract Analysis Compared

Training Data

Definition

Why It Matters for Lawyers

How AI Tools Handle It

Frequently Asked Questions

Related Concepts

LLM (Large Language Model)

Fine-tuning

Model Card (AI Transparency)

AI Bias (Legal Context)

Related Tools

Related Comparisons

Related Reading

Training Data

Definition

Why It Matters for Lawyers

How AI Tools Handle It

Frequently Asked Questions

Related Concepts

LLM (Large Language Model)

Fine-tuning

Model Card (AI Transparency)

AI Bias (Legal Context)

Related Tools

Related Comparisons

Related Reading