LawyerAILawyerAIIndependent Reviews
  • Search
  • Categories
  • Tag
  • Collection
  • Blog
  • Compare
  • Glossary
  • Solutions
  • Pricing
  • Submit
LawyerAILawyerAI
  1. Home
  2. ›
  3. Glossary
  4. ›
  5. LLM (Large Language Model)

LLM (Large Language Model)

A large language model (LLM) is an AI system trained on large volumes of text data to predict and generate human-like text; it serves as the core engine underlying most legal AI tools for research, drafting, and document analysis.

Last reviewed: 2026/05/19

Definition

Why It Matters for Lawyers

How AI Tools Handle It

Frequently Asked Questions

Q1: What is the difference between an LLM and a traditional legal research database?
A traditional legal database is a structured repository that retrieves documents matching search queries. An LLM generates new text based on statistical patterns learned during training — it does not retrieve documents but rather produces text that resembles authoritative answers. Legal research AI tools typically combine both: using LLMs for language understanding and generation, and databases for sourcing verified legal content.
Q2: Are LLMs trained on confidential client information?
That depends on the tool and contract. General-purpose LLMs are trained on public data. When a lawyer submits client documents to a legal AI tool, whether that data is used for model training depends on the vendor's data processing terms. Most enterprise legal AI vendors explicitly prohibit use of submitted content for model training. This should be confirmed in the vendor agreement before submitting any client-confidential material.
Q3: How often do legal AI tools update their underlying LLM?
Update frequency varies by vendor and is not always disclosed publicly. Foundation models are periodically updated or replaced by their developers, and legal AI vendors may update their applications to use newer model versions. However, the legal content corpus used in RAG applications is typically updated more frequently than the underlying model weights. --- *Last reviewed: 2026-05-19 by LawyerAI Editorial Team.*

Related Concepts

Tech / Model

Hallucination (in Legal AI)

Hallucination in legal AI refers to instances where an AI model generates factually incorrect, fabricated, or unsupported output — such as nonexistent case citations, invented statutes, or inaccurate summaries of legal holdings — presented with apparent confidence.

Tech / Model

RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG) is an AI architecture that combines a retrieval system — which fetches relevant documents from a specified corpus — with a generative language model that produces answers grounded in those retrieved documents, rather than relying solely on the model's training data.

Tech / Model

Fine-tuning

Fine-tuning is the process of further training a pre-trained large language model on a domain-specific dataset to improve its performance on tasks in that domain, such as legal document analysis, contract drafting, or jurisdiction-specific research.

Tech / Model

Training Data

Training data is the corpus of text and examples used to train a large language model, establishing its capabilities, knowledge, and limitations; the quality, recency, and composition of training data directly affects the model's reliability for legal tasks.

Related Tools

  • Harvey AI

    The most expensive legal AI in the market — Am Law 100 firms only.

  • CoCounsel

    Thomson Reuters' GPT-backed research and drafting with Westlaw integration.

  • Westlaw Precision AI

    AI-powered legal research with citation-validated answers from Westlaw.

  • Lexis+ AI

    Conversational legal research with real-time Shepard's citation validation.

  • Spellbook

    AI contract drafting and review inside Microsoft Word for transactional lawyers.

Related Comparisons

  • CoCounsel vs Westlaw Precision AI: Same Company, Different Products

Related Reading

  • How We Score Legal AI Tools: The 5-Dimension Methodology
  • AI Hallucination in Legal Research: A Practitioner's Guide

Last reviewed: 2026/05/19. Definitions are written by the LawyerAI Editorial team. We do not accept affiliate commissions; Featured placement is clearly labeled and does not influence editorial content.

← All glossary terms
LawyerAILawyerAI

Independent Reviews

The independent directory of AI tools for lawyers — reviewed by methodology, not by ad budget.

X (Twitter)
Tools
  • Search
  • Categories
  • Tag
  • Collection
Resources
  • Blog
  • Compare
  • Glossary
  • Solutions
  • Pricing
  • Submit
  • Suggest a Tool
  • Newsletter
Company
  • About Us
  • Studio
Legal
  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • Refund Policy
  • Editorial Independence
  • Sitemap
Editorially independent. Methodology open and versioned.
© 2026LawyerAI Editorial

A large language model (LLM) is an AI system trained on large volumes of text data to predict and generate human-like text; it serves as the core engine underlying most legal AI tools for research, drafting, and document analysis.

LLMs are the technical foundation of the current generation of legal AI tools. Understanding what an LLM is — and what it is not — helps lawyers calibrate their reliance on AI-assisted work product.

An LLM does not "know" the law the way a lawyer does. It learns statistical patterns from training text, which enables it to generate text that resembles authoritative legal analysis. It does not reason from principles; it predicts likely text. This distinction matters when an LLM produces a confident-sounding answer about a legal question: the confidence reflects pattern-matching, not verified accuracy.

For lawyers, the practical implications are several. LLMs perform well on tasks where the correct output resembles patterns in the training data — drafting standard commercial clauses, summarizing documents with clear structure, explaining well-established legal doctrines. They perform less reliably on tasks requiring precise factual recall (exact citation text), novel legal reasoning, or jurisdiction-specific analysis not well represented in training data.

The legal AI market is built largely on top of foundational LLMs (GPT-4 family, Claude, Gemini, and others) with varying amounts of legal specialization, fine-tuning, and retrieval augmentation layered on top. The base model matters, but the legal-specific engineering applied to it often matters more for task-specific performance.

Most legal AI vendors do not train their own foundational LLMs — the compute and data requirements are prohibitive. Instead, they build on top of foundation models from Anthropic, OpenAI, Google, Meta, or others, applying legal-specific fine-tuning, prompt engineering, and retrieval augmentation to improve legal task performance.

Harvey AI, for example, is built on top of OpenAI's models with legal-specific tuning and integration capabilities. Cocounsel applies GPT-4 architecture with Casetext's legal research infrastructure. Westlaw Precision AI and Lexis+ AI integrate foundation models with their respective legal content databases through RAG architecture.

The degree to which a tool discloses its underlying LLM and model architecture varies. Some vendors are transparent about the base model; others treat this as proprietary. Understanding the architecture helps lawyers assess hallucination risk and data privacy implications — knowing which vendor processes the data that passes through the LLM.

No LLM produces perfect legal output. All require attorney verification before the output is used in client work.