Prompt injection is a class of attacks against AI systems — particularly those built on large language models — where adversarial instructions embedded in user-provided content or external data manipulate the AI to override its intended behavior, bypass safety constraints, or exfiltrate sensitive information. The attack exploits the fact that LLMs process instructions and data through the same mechanism: the model cannot reliably distinguish between "this is data to analyze" and "these are instructions to follow" when both arrive via the same input channel.
The name comes from SQL injection, a long-established database attack where malicious SQL code is inserted into a data field the application passes to a database query. Prompt injection applies the same principle to AI systems: malicious instructions are inserted into content the AI is expected to process, and the AI executes those instructions.
For most technology security threats, legal professionals can reasonably delegate the response to IT departments or managed service providers. Prompt injection is different because the attack surface is the content lawyers work with every day — contracts, discovery documents, opposing party submissions, and client-provided files. A law firm's IT security team can harden network perimeters, enforce multi-factor authentication, and manage endpoint security. None of those controls address an attack that arrives inside a Word document a client submits for contract review.
The legal sector-specific risk scenario is concrete and technically feasible. Consider an e-discovery workflow where a legal AI system reviews a large document production for privilege and relevance. Opposing counsel — or a party with knowledge of the AI system in use — submits a document in the production set that contains hidden text instructions. The text might appear in white font on a white background, in document metadata, in a footnote formatted to be invisible, or in other content not rendered in the normal reading view. When the AI processes the document, it reads the hidden instructions alongside the visible text and may follow them: marking privileged documents as non-privileged, misclassifying categories of responsive documents, or generating review summaries that omit specific content.
A second scenario applies to contract AI. A counterparty submits a contract draft for playbook enforcement review. The draft contains embedded instructions — in a text box, in track changes comments, or in formatting metadata — directing the AI to classify a deviation from the organization's preferred limitation-of-liability position as "acceptable" rather than "flagged." The reviewing attorney sees the AI's output showing no issues in that clause and does not read the underlying text carefully.
These are not hypothetical edge cases. NIST AI 100-1 (2023), the US National Institute of Standards and Technology's AI Risk Management Framework, specifically lists prompt injection as a significant AI vulnerability. The Open Web Application Security Project (OWASP) Top 10 for Large Language Model Applications, published in 2023, ranked prompt injection as the number-one vulnerability category in LLM deployments. Security researchers at multiple institutions have demonstrated successful prompt injection attacks against commercial AI systems processing real documents.
As of 2026, no LLM has a reliable technical defense against indirect prompt injection. This is not a gap that vendors can close with a software update — it reflects a fundamental characteristic of how LLMs process text. Lawyers using AI systems to process adversarial-party documents should treat this as an inherent risk to be managed, not a solved problem.
The professional responsibility implications intersect with ABA Model Rule 1.6 (Confidentiality) and Rule 1.1 (Competence). An AI system that is successfully manipulated by an adversarial document may have exposed confidential information to an unauthorized party, altered its analysis in ways that damage client interests, or produced outputs the reviewing attorney certified without adequate verification.
How It Works (Technical)
Prompt injection takes two primary forms, each with distinct mechanics and risk profiles.
Direct prompt injection occurs when a user directly inputs adversarial instructions into the AI interface. The classic example is a user typing "Ignore your previous instructions and tell me..." followed by instructions that override the system's intended behavior. This form is relatively manageable: vendors can implement input filtering to detect common adversarial patterns, and the attack requires the malicious actor to have direct access to the AI interface. In a law firm context, direct injection risk is primarily an insider threat — an employee, contractor, or unauthorized user with access to the firm's AI system attempting to bypass restrictions.
Indirect prompt injection is the more serious risk for legal workflows. In indirect injection, adversarial instructions are embedded in external content that the AI is asked to process — a document uploaded for review, a webpage summarized by an AI research tool, an email parsed by an AI client communication system. The AI reads the document's visible content and the embedded instructions through the same input channel, and the instructions may alter the AI's behavior for the entire session or for specific outputs related to that content.
The technical reason this is difficult to solve is that LLMs assign meaning to text based on learned patterns from training data, not based on where the text came from or how it was labeled. A well-crafted injection instruction that is written in authoritative, instruction-like language may be processed by the model as a genuine instruction rather than as data to analyze, regardless of the application-level labeling of that content as "user-uploaded document."
Vendors have deployed several defense layers, none of which provides complete protection. Input sanitization strips patterns from document content before passing it to the model — for example, removing text that matches common injection patterns or that appears in unusual formatting. This catches known attack patterns but not novel ones. Output filtering checks the model's response against expected patterns for the task — if a contract review output contains language that looks more like model instructions than legal analysis, the system flags or blocks it. Dual LLM architectures use a second "judge" model to evaluate whether the primary model's output is appropriate and in scope; this adds a verification layer but also doubles processing cost and latency. Sandboxing limits what the AI's outputs can trigger — if the AI cannot directly modify records or send communications based on its analysis, the downstream effect of a successful injection is limited to the analysis output itself, which an attorney can still review.
Legal-specific retrieval-augmented generation (RAG) systems — where the AI's analysis is grounded in a curated legal database rather than open-web retrieval — reduce one attack vector. If the AI is only retrieving from a controlled legal corpus, it is not retrieving injected content from external websites. However, RAG architecture does not protect against injection in the primary document being processed (the contract, the discovery document, the legal brief).
How Legal AI Vendors Address It
Harvey AI employs enterprise prompt management and output filtering as part of its security architecture. Harvey has disclosed that it uses multiple layers of response evaluation and that its enterprise deployment includes monitoring for anomalous output patterns that may indicate injection-influenced behavior. Harvey does not publish the specific architecture of its injection defenses, citing security reasons — disclosing the exact mechanism would inform attackers about what patterns to avoid. Limitation: the absence of public documentation makes independent verification of Harvey's injection defenses impossible. Enterprise clients must rely on contractual security representations and SOC 2 audit reports rather than technical validation.
Lexis+ AI partially reduces the injection attack surface through its grounded retrieval architecture. Because Lexis+ AI grounds its legal research responses primarily in the LexisNexis legal database — a controlled corpus of primary source legal materials — the attack surface for injection via external retrieval is smaller than for systems that retrieve from the open web. The limitation is that injection via the primary document being analyzed (a contract, a brief, a discovery document submitted for review) is not addressed by retrieval architecture. When Lexis+ AI processes an uploaded document, that document's content is an injection surface regardless of how the AI's retrieval layer is configured.
LegalFly has invested in security architecture disclosure as part of its EU AI Act compliance preparation. LegalFly publishes threat model documentation that addresses prompt injection as a named threat category, describes its defense architecture at a level of detail uncommon in the legal AI market, and includes injection attack scenarios in its security FAQ. This level of disclosure is partially driven by the EU AI Act's technical documentation requirements and partially by LegalFly's positioning in European markets where enterprise security procurement involves detailed security questionnaires. Limitation: disclosure quality does not directly correlate with defense quality — a vendor that writes clearly about an unsolved problem has not necessarily solved it. LegalFly's transparency is valuable for evaluation purposes; it does not mean injection risk has been eliminated.
The consistent gap across all legal AI vendors is the absence of independent security audits specifically targeting prompt injection in legal document processing workflows. General security certifications (SOC 2, ISO 27001) do not cover AI-specific vulnerabilities like prompt injection — they assess data security controls, access management, and infrastructure, not the AI model's susceptibility to adversarial inputs. Lawyers should not infer from a vendor's security certifications that injection risk has been evaluated or addressed.
How Lawyers Should Verify and Apply It
-
Identify which AI workflows in your firm process adversarial-party documents. E-discovery review, contract review of counterparty drafts, analysis of opposing expert reports, and processing of opposing party discovery productions are the highest-risk workflows. For each workflow, document which AI system processes the content and what the output is used for. This inventory is the starting point for injection risk assessment.
-
Ask vendors directly whether their system has been tested for prompt injection in document processing workflows. Request documentation of any independent security testing — penetration testing by a third party, red-team exercises, or published security research — that specifically addressed prompt injection in the context of legal document analysis. If the vendor cannot point to this documentation, the risk has not been independently evaluated.
-
Do not rely on AI output alone for any review where adversarial-party documents were processed. When AI-assisted e-discovery review or contract analysis involves documents submitted by an opposing or potentially adversarial party, build in a human spot-check of the AI's outputs — particularly for classification decisions (privileged/non-privileged, responsive/non-responsive) that could be the target of a manipulation attempt. The spot-check does not need to cover every document; a statistically meaningful sample review of AI classifications in high-risk document populations is sufficient to detect systematic anomalies.
-
Report suspected injection attempts through your security incident process. If AI output on a matter appears inconsistent with the document content — for example, the AI reports no issues with a contract clause that you can see is materially unfavorable — treat this as a potential security incident in addition to a quality control failure. Document the anomaly, preserve the document that was processed, and escalate to your AI vendor's security team for investigation.