A model card is a structured disclosure document that describes an AI model's intended uses, performance metrics, training data sources, evaluation methodology, and known limitations. The format was formalized in the 2019 paper "Model Cards for Model Reporting" by Margaret Mitchell, Timnit Gebru, and colleagues at Google Research, and has since become the de facto standard for communicating what an AI system can and cannot reliably do.
For lawyers evaluating legal AI tools, a model card is the closest equivalent to a product specification sheet — except that most legal AI vendors publish partial versions, marketing-friendly summaries, or nothing at all. Reading and requesting model cards is a due diligence step that most legal buyers currently skip, creating adoption risk that is addressable with modest additional effort.
The practical reason for reading a model card before deploying a legal AI tool is professional responsibility. ABA Model Rule 1.1 (Competence) requires that lawyers understand the tools they use in client matters. A 2024 survey by the American Bar Association's Legal Technology Resource Center found that only 19% of lawyers who reported using AI tools for legal work had reviewed any technical documentation about those tools before deployment. That gap between adoption rate and documentation review represents a competence risk that state bars are increasingly likely to scrutinize.
There are three specific reasons a model card is relevant to legal work.
First, it identifies limitations by task type. An AI model that performs well on contract summarization may have substantially higher error rates on regulatory compliance analysis. Model cards with quantitative analyses report performance disaggregated by task, domain, and sometimes by input characteristics. A lawyer who reads that a model's citation accuracy drops to 73% on pre-2010 case law before using the tool for historical research can build verification steps into the workflow. A lawyer who does not read the model card will discover the limitation in a client matter.
Second, it identifies training data boundaries. If a legal AI model was trained on data through a specific cutoff date, it will not reflect developments in case law, statutes, or regulations after that date. The cutoff should appear in the model card. For fast-moving regulatory areas — sanctions law, environmental compliance, employment law — a training cutoff as recent as 12 months ago can produce outdated analysis. The 2025 Stanford RegLab report on AI use in legal research found that training data recency was the single most frequently cited limitation by lawyers who had encountered AI-generated errors.
Third, it surfaces potential conflicts. If a legal AI vendor trained its model using data from law firms (including, potentially, firms that are clients, opposing parties, or competitors of your clients), the model card should disclose the general nature of the training data. Most do not disclose this in sufficient detail, which is itself a signal worth noting.
The regulatory context is shifting. The EU AI Act Article 53 imposes technical documentation requirements on providers of general-purpose AI models — requirements that functionally mandate model card-equivalent disclosures for GPAI models with systemic risk. For legal AI tools marketed to EU lawyers, these obligations are live or imminently live as of 2026. US-based vendors selling to EU firms are affected regardless of where the vendor is headquartered.
How It Works (Technical)
The Mitchell et al. (2019) framework established eight standard sections that a complete model card should address:
1. Model Details. Name, version, type of model (e.g., large language model, retrieval-augmented generation system), date of training or release, and contact information for the model team. This section establishes what specific system is being described — important because vendors often release multiple model versions with meaningfully different capabilities.
2. Intended Use. Primary intended use cases (e.g., "contract clause extraction for NDA review") and explicit out-of-scope uses (e.g., "not intended for use in criminal proceedings or regulatory filings"). Out-of-scope use disclosures are where vendors flag applications where the model has not been evaluated and should not be trusted without additional safeguards.
3. Factors. The characteristics of inputs and populations across which the model's behavior may vary. For a legal AI system, relevant factors might include jurisdiction (US vs. EU law), document language, practice area, and document length. A model card should identify which factors were evaluated and which were not.
4. Metrics. The performance measures used to evaluate the model — accuracy, precision, recall, F1 score, hallucination rate — and the decision thresholds applied. Metrics without context are not useful: a citation accuracy rate of 95% is meaningless without knowing what corpus was tested, what types of citations were included, and how errors were defined.
5. Training Data. A summary of the datasets used to train the model — not necessarily their full contents, but sufficient description to assess data provenance, recency, and potential biases. Training data disclosure is where legal AI vendors are most likely to provide incomplete information, often citing proprietary data agreements as the constraint.
6. Evaluation Data. The benchmark datasets used to assess performance after training. Knowing whether evaluation was performed on a held-out portion of the training data (less rigorous) or on an independent benchmark dataset (more rigorous) is critical for interpreting the performance metrics.
7. Quantitative Analyses. Performance results disaggregated by the factors identified in section 3. This is where a complete model card shows performance variation across jurisdictions, document types, or time periods — the information most directly useful for identifying where additional verification is required.
8. Ethical Considerations and Caveats. Known failure modes, potential harms, unresolved questions about the model's behavior, and recommendations for use. This section often contains the most practically relevant information for lawyers.
EU AI Act Article 53 requires GPAI model providers to maintain technical documentation that covers substantially the same ground — training data description, architecture, capabilities, and limitations — and to make a summary available publicly. The regulation does not use the term "model card," but the functional requirement is equivalent.
How Legal AI Vendors Address It
Harvey AI maintains limited public-facing model documentation. Enterprise contracts provide more detail, including information about the underlying foundation models (Harvey has disclosed using Anthropic and OpenAI models as components of its system). For the purposes of due diligence, lawyers evaluating Harvey must rely on the underlying foundation model cards from Anthropic and OpenAI, which are more complete, and on Harvey's enterprise documentation, which is not publicly available. The structural limitation is that Harvey's value-add layer — the legal-specific fine-tuning and retrieval architecture — is not documented in publicly accessible model card form as of 2026.
Lexis+ AI (LexisNexis) publishes benchmark performance data for citation accuracy and task-specific metrics on its marketing and support materials. The underlying model architecture is not disclosed in detail. LexisNexis's primary transparency mechanism is data provenance: the system is grounded in the Lexis legal research database, and the citation of primary sources is auditable. This creates a partial substitute for a technical model card — you can verify whether a cited case exists, even if you cannot evaluate the model's internal workings. Limitation: data provenance transparency does not address the full scope of model card information, particularly regarding factors affecting performance variation across practice areas and jurisdictions.
LegalFly is a European legal AI provider that has invested in regulatory compliance transparency ahead of the EU AI Act Article 53 requirements. LegalFly's documentation approach is more aligned with the model card framework than most US competitors, including disclosure of training data characteristics and intended use scope. This EU-native compliance orientation reflects the regulatory pressure on providers operating in European markets. Limitation: LegalFly's performance benchmarks are published against European legal systems; performance data for US law is less detailed.
The broader transparency gap is significant. A 2025 review of 14 legal AI vendors by the Stanford CodeX Center found that only 3 published documentation that addressed all eight standard model card sections in substantive form. The remaining 11 published partial documentation ranging from performance marketing claims to brief FAQ pages. This gap makes independent vendor evaluation difficult and increases lawyers' dependence on vendor-supplied representations.
How Lawyers Should Verify and Apply It
-
Request the model card or technical documentation sheet in writing before signing any AI vendor contract. A vendor's failure to provide this documentation is itself a relevant data point. If the vendor cannot describe what their model does and does not do in a structured format, the adequacy of their own internal evaluation should be questioned.
-
Cross-reference the intended use disclosures against your actual use case. If you intend to use a contract AI tool for employment agreement review and the model card lists "software vendor agreements" as the primary evaluation use case, you are operating outside the validated envelope. Ask the vendor for evaluation data specific to your use case, or build additional verification steps into your workflow.
-
Identify the training data cutoff date and assess it against your practice area. Verify that the cutoff is recent enough for your subject matter. For practice areas with high regulatory churn — healthcare, financial services, immigration — treat any AI output referencing statutes, regulations, or agency guidance as requiring independent currency verification.
-
Review the quantitative analyses section for performance disaggregation. If the vendor does not publish disaggregated performance data by jurisdiction, document type, or practice area, ask whether such data exists and whether it can be provided under NDA. Aggregate accuracy metrics mask the performance variation that is most relevant to specific legal tasks.
-
Record your due diligence. Document that you reviewed the model card, what limitations you identified, and what additional verification steps you built into your workflow. If a client matter later involves a question about whether the AI tool was used responsibly, contemporaneous documentation of your evaluation process is the evidence that ABA Rule 1.1 competence obligations were met.