We respect attorney-client confidentiality. No tracking pixels in our emails.
We respect attorney-client confidentiality. No tracking pixels in our emails.

We score every legal AI tool across 5 dimensions — Accuracy, Speed, Usability, Value, Security — on a 1–5 scale. Editorial scores are independent of Featured placement. Below: why we picked these 5 dimensions, how each is measured, and how we handle conflicts of interest.
2026/04/07
Last reviewed: 2026/05/18
TL;DR · We score every legal AI tool across 5 dimensions — Accuracy, Speed, Usability, Value, Security — on a 1–5 scale with 0.1 precision. Editorial scores are independent of Featured placement. Vendors cannot pay to alter scores. Below: why we picked these 5 dimensions, how each is measured, and how we handle conflicts of interest.
The legal AI market in 2026 is worth over $5 billion annually and growing 40%+ year over year. Over 40 vendors offer credible products. Most lawyers evaluating these tools rely on one of three sources:
None of these answer the question a lawyer actually wants answered: "For my practice area, my firm size, and my risk tolerance — which of these 40 tools is right, and which would be a costly mistake?"
The risk of getting this wrong is no longer theoretical. In Mata v. Avianca (2023), two attorneys submitted a brief citing six cases — all fabricated by ChatGPT. As of early 2026, Damien Charlotin's AI Hallucination Cases Database has documented 486 such cases worldwide, implicating 128 lawyers and 2 judges, with over $50,000 in court-assessed fines. A Stanford HAI benchmark found that Lexis+ AI hallucinated on 17% of legal research queries and Westlaw AI on 33% — and these are the premium tools.
A lawyer choosing legal AI is not choosing software. They are choosing a level of liability exposure.
LawyerAI exists to make that choice clearer. This methodology is our attempt to do so transparently.
Every tool in our directory is scored on five dimensions, each from 1.0 to 5.0 in 0.1 increments:
What it measures: Factual correctness of AI output in the context of legal work.
For legal research tools (CoCounsel, Westlaw Precision, Lexis+ AI, Casetext, Paxton AI), accuracy is dominated by citation reliability — does the tool fabricate citations, misquote holdings, or apply overruled precedent? We weight this against published benchmarks where available (Stanford HAI, vendor-disclosed evaluations) and triangulate with practitioner interviews.
For contract review tools (Spellbook, Luminance, Kira, LawGeex, Trellis), accuracy is redline precision: does the tool catch the clauses a senior associate would catch, without over-flagging boilerplate? We score against industry playbooks (ABA contract checklists, BarbriBenchmark frameworks) and real contracts of varying complexity.
For litigation and eDiscovery tools (Everlaw, Relativity AI, Briefpoint, Clearbrief), accuracy means correct privilege calls, accurate witness analysis, and reliable case timeline construction.
Harvey AI claims an internal hallucination rate of approximately 0.2% — orders of magnitude better than generalist LLMs. We treat such vendor-disclosed numbers as inputs to our score, not as the score itself. Independent verification matters more than self-reported metrics.
A score of 5.0 on Accuracy means: Citation-grade output for the tool's primary use case, with documented hallucination rates under 1%, verified through at least two independent sources.
What it measures: Real-world response time for typical tasks, not marketing benchmarks.
A research tool that takes 90 seconds to return a memo is fundamentally different from one that takes 8 seconds. A contract review tool that processes a 100-page master service agreement in 4 minutes serves a different workflow than one that takes 45 minutes. We measure:
We do not score speed in a vacuum. A slower tool with citation validation built in is often more valuable than a faster tool that requires human re-verification. We score speed in the context of the workflow the tool serves.
What it measures: How much friction stands between a lawyer and value.
The single biggest predictor of whether a law firm gets ROI from legal AI is not the model quality — it is whether the lawyers actually use the tool. Harvey, Spellbook, and Ironclad all understand this; their products integrate with Microsoft Word, Outlook, and existing document management systems precisely because lawyers will not change their workflow for software, no matter how powerful.
Usability includes:
A 5.0 on Usability is a tool a solo practitioner can master in a weekend.
What it measures: Cost per outcome — not cost per seat.
Legal AI pricing in 2026 ranges from free tiers (Smith.ai's basic plan) to $40,000+/year for small enterprise deployments (Harvey, with typical 10-seat minimum). Mid-market tools like CoCounsel and Spellbook fall in the $50–$200/user/month range. Practice management AI like Clio and MyCase bundle their AI features into existing subscriptions.
The honest comparison is not "Tool A costs less than Tool B." The honest comparison is "Tool A delivers X outcomes per dollar for this practice profile." We score Value by:
A tool with opaque pricing is scored down on Value regardless of its capability. Transparency is itself a feature.
What it measures: Confidentiality posture and data handling — table stakes that turn out not to be table stakes.
Lawyers operate under ABA Model Rule 1.6 (Confidentiality of Information) and ABA Model Rule 1.1 Comment 8 (Competence — which now includes AI competence in most jurisdictions). Using a legal AI tool that mishandles client data is not just a security failure; it is a potential ethics violation.
We score Security on:
A tool that cannot pass enterprise legal department security review will score under 3.0 on Security regardless of how good the AI itself is. A lawyer who deploys it anyway is taking on personal liability the tool cannot absorb.
Other frameworks exist. Harvey publishes a 7-criteria evaluation guide. Ironclad uses a "4 Cs" framework (Cost, Capability, Compliance, Control). Vendor lists often score on a dozen attributes, most of which correlate.
We chose 5 because it is the smallest number that captures the trade-offs lawyers actually make:
Five dimensions force these trade-offs to be visible on every tool page. A reader looking at our scoring for, say, Spellbook can see instantly that it scores high on Usability and Value but is constrained on Accuracy for novel jurisdictions. A reader looking at Harvey AI sees a different shape — top Accuracy and Security, mid-range Value due to enterprise pricing.
The shape of the score tells you more than the average.
This is the part that matters most, and the part most "objective" review sites quietly skip.
LawyerAI will eventually charge vendors for Featured placement and enhanced listings. We are transparent about this. Featured listings exist because vendor sponsorship is the most ethically straightforward way to fund independent editorial work — more straightforward than affiliate commissions (which incentivize promoting the tools that pay best, not the tools that work best) and more sustainable than ad networks (which place us at the mercy of Google's algorithm).
Featured placement does not affect editorial scores. This is not aspirational. It is structural:
A vendor cannot pay to be scored higher. A vendor cannot pay to suppress a competitor's score. A vendor who tries either will have the attempt disclosed in our annual transparency report.
This independence is also why our reviews include a "Hands-on review pending" note where appropriate. For some tools — especially recent launches and enterprise platforms that resist trial access — our scoring relies on triangulated secondary sources rather than direct testing. We say so. A score we cannot fully verify is not a score we will pretend to verify.
Honest methodology requires honest limits.
We cannot tell you whether your firm should buy this tool. Scoring is a vector, not a recommendation. A 4.8 on Accuracy with a 2.2 on Value might be perfect for AmLaw 50 M&A practice and irresponsible for a 4-person plaintiff's PI shop. The dimensions are designed so the trade-offs are visible; the decision still belongs to the firm.
We cannot benchmark every tool against every workflow. Our 5-dimension scores are weighted toward the primary use case of each tool. A score of 4.5 on Accuracy for Westlaw Precision means it is highly accurate for legal research. It does not predict how it will perform at contract redlining (it was not built for that and we will not score it that way).
We cannot score what we cannot test. Tools that gate access behind enterprise sales processes, refuse trial accounts, or require non-disclosure of evaluation results are scored on what we can verify externally. Those scores carry an "Indicative" label and are revised when direct evaluation becomes possible.
We cannot predict the future. The legal AI market in 2026 will look different in 2027. Vendors release new models, pricing changes, certifications get added or revoked. Every score in our directory carries a Last Reviewed date. Anything older than 6 months is flagged for re-review.
When you visit any tool page on LawyerAI — say, Harvey AI or Spellbook — you will see five horizontal bars. Each bar shows that dimension's score from 1.0 to 5.0, with the numeric score displayed in gold. The overall card on category pages and the comparison matrix uses these same five numbers.
Three reading rules:
The five dimensions are stable. The way each is measured will refine over time:
If you are a lawyer using these tools, your feedback is the input that matters most. If you are a vendor whose tool we have scored, we publish a formal score-revision-request process and respond to all credible challenges within 30 days. The methodology is open to be argued with. It is not open to be paid to change.
No. Editorial scoring and Featured sales operate on separate workflows with no shared personnel. Featured customers are visually labeled "Sponsored" wherever placement is paid; everywhere else, position is driven by score.
Where independent benchmark data exists (Stanford HAI for legal research tools, internal practitioner testing for contract review), it is the dominant input. Where it does not exist, we triangulate vendor-disclosed metrics, practitioner interviews, and direct testing. Hallucination rate is one of three factors in Accuracy — the others being citation precision and reasoning quality.
Because we have not yet directly tested the tool. Some enterprise platforms gate evaluation access behind sales processes that take months. We score what we can verify externally and label what we cannot.
Yes. Vendors can submit a Score Revision Request explaining what evidence they believe was missed or misinterpreted. The Editorial Standards Board responds within 30 days. Score changes are published with a brief note explaining the revision. Vendors who escalate beyond evidence-based revision requests — through legal threat, social pressure, or commercial leverage — are flagged in our annual transparency report.
Every tool carries a Last Reviewed date. Tools older than 6 months are flagged for re-review. Tools that release major model updates trigger immediate re-evaluation. Tools with public security incidents or material pricing changes are re-reviewed within 30 days.
LawyerAI is an independent directory of AI tools for lawyers, scored across 5 dimensions. We do not accept affiliate commissions. Featured placement is clearly disclosed and does not influence editorial scoring. For methodology questions, score revision requests, or editorial contact: editor@lawyerai.directory.
This methodology is version 1.0, published 2026-05-18. Material revisions will be versioned and dated.