How Large Language Models Are Reshaping Phishing

For roughly two decades, "bad grammar" was on every list of phishing red flags. The advice was sound, most phishing operators were not native English speakers and the lures showed it. Spelling mistakes, awkward syntax, and stilted formal English were among the most reliable indicators that an email was malicious.

Large language models have eliminated that signal entirely. Every phishing email a 2026 threat actor sends has access to native-quality English, French, German, Spanish, Mandarin, Korean, Polish, Portuguese, and dozens of other languages on demand. The grammar tells are gone. What is left is a richer, more concerning threat picture.

What LLMs change in the lure economy

The mechanics of LLM-assisted phishing are not exotic. Threat actors use commodity LLM access, ChatGPT, Claude, Gemini, open-weight models running locally, to generate or refine email copy. The transformations:

Native-quality writing in any target language. Phishing campaigns can now economically target speakers of languages where the operator has no fluency. The Polish, Czech, and Lithuanian phishing markets have grown substantially since 2023.

Personalisation at scale. An LLM can take a target’s LinkedIn profile, recent news mentions, employer information, and conversational style and generate an email that references all of these naturally. The previously expensive "spear phishing" technique, manual personalisation, is now automatable at the scale of mass phishing.

Tone matching. The LLM can generate copy in formal corporate English, casual peer-to-peer English, or any tonal register requested. Lures impersonating an executive sound executive; lures impersonating a colleague sound colleague.

Iteration speed. An operator can generate dozens of variants of a lure in minutes and select the most plausible. The optimisation that used to require A/B testing now happens at the prompt level.

Branded clone refinement. LLMs can generate or refine HTML email templates that mimic legitimate brand emails with high fidelity. Mandiant and IBM X-Force both documented increased fidelity in branded phishing through 2024.

What LLMs do not change

Several aspects of the phishing economy are unchanged:

Infrastructure remains a bottleneck. Domains, hosting, and TLS certificates are still operational requirements. LLMs do not generate these.

Credential collection still requires phishing kits. The AiTM kits described in the separate phishing-anatomy post (Tycoon 2FA, Mamba 2FA, Evilginx) handle the technical capture; LLMs only handle the lure.

Detection and response can still operate on infrastructure indicators. URL reputation, sender domain, sending behaviour patterns, link analysis, and AiTM-specific detections continue to work regardless of the lure quality.

The user’s behavioural pattern matters. Whether a user clicks a link, enters credentials, and approves an MFA prompt is still a function of trust, urgency, and context, qualities that LLM-improved lures can manipulate but not eliminate.

The measurable evidence

Several studies have measured LLM impact on phishing effectiveness:

A 2023 Hoxhunt study compared click-through rates on AI-generated phishing emails vs. human-generated ones across roughly 50,000 employees. AI-generated lures performed competitively with skilled human operators after iterative refinement.

IBM X-Force’s "X-Force Threat Intelligence Index" through 2024 documented a meaningful increase in cross-language phishing campaigns, with attribution to LLM availability based on consistent stylistic markers.

Microsoft Threat Intelligence’s reporting on "Phishing-as-a-Service" markets through 2024 shows multiple kit operators advertising "AI-assisted lure generation" as a feature.

Anthropic’s own usage policy reports and OpenAI’s transparency reports include accounts of identified bad-actor uses of their APIs for phishing-content generation, indicating volumes large enough to be detectable.

The aggregate signal: LLM-assisted phishing is not a hypothetical; it is the default in 2026, and the median quality of phishing email is meaningfully higher than it was in 2022.

Personalised attacks that were not viable before

The category that has changed most is personalisation. Pre-LLM, sending a thousand emails each personalised based on the recipient’s recent activity required either a thousand minutes of analyst time or a generic template that did not actually personalise. Post-LLM, the same thousand emails can be generated automatically with personalisation that references the recipient’s specific job title, employer’s recent news, conversational history (for accounts where the attacker has read access), and tone preferences.

Practical consequences:

Spear phishing at the volume of mass phishing. The category boundary between targeted and non-targeted phishing has eroded.

Conversation-aware lures. Attackers with access to a compromised email account can use LLMs to read prior conversations and draft replies that continue threads naturally. The "thread hijacking" technique that emerged with Emotet is dramatically improved by LLM assistance.

Multi-stage lure refinement. The LLM can suggest follow-up messages based on user response, automating what used to be live human social engineering.

Cross-channel coordination. LLM-driven content for email, SMS (smishing), Teams/Slack messages, and voice scripts produces coherent campaigns across channels.

Defence, what works

The defensive playbook adapts but does not transform:

Phishing-resistant MFA remains the highest-leverage control. The lure quality does not affect the cryptographic protection; passkeys and FIDO2 hardware keys do not authenticate to the wrong origin regardless of how convincing the lure is.

Email authentication (SPF, DKIM, DMARC at p=reject) catches a lot of impersonation regardless of lure content.

Behavioural detection of unusual access patterns. The post-compromise activity is similar regardless of lure sophistication; conditional access policies and impossible-travel detection continue to fire.

Content-based detection has degraded. Filters that historically caught phishing through text-pattern matching are less effective; ML-driven email security (Abnormal Security, Microsoft Defender for Office 365’s improved layers, Proofpoint’s evolved engines) remains useful but is in an arms race with LLM-generated copy.

User reporting culture. The "Report Phish" button is more important than ever, because the visible cues are gone but the operational pattern (urgency, unusual request, link to login page) remains. A user trained to question the request itself, not the language quality, can still flag the email.

What to tell users

The training advice has shifted:

The "look for spelling mistakes" guidance is obsolete. Drop it from training materials.

The "is the sender address slightly wrong" guidance is still useful but increasingly defeated by display-name spoofing and domain lookalikes.

The "did you expect this email" check is the most durable signal. An unexpected email demanding action is suspicious regardless of how well-written it is.

The "verify through a separate channel" step is non-negotiable for any high-consequence action initiated by email.

The "approve only the MFA prompts you initiated" rule still works against AiTM, especially with phishing-resistant MFA.

The longer outlook

LLM-assisted phishing is the new floor, not a peak. The next progression, already visible, combines LLM-generated lures with deepfake voice and video for multi-modal social engineering. The Arup deepfake case (covered separately) is the canonical example.

The defensive investments that work are the ones that do not depend on the user noticing something is wrong. Phishing-resistant authentication, conditional access, and post-compromise detection are durable. User-judgement-based defences are increasingly fragile.

The era of grammar-based phishing detection has ended. Acknowledging that, and shifting defensive emphasis accordingly, is the central operational adjustment of 2026.

How Large Language Models Are Reshaping Phishing

Vibe coding is shipping vulnerabilities at scale in 2026

Prompt injection left the lab in 2026. It is in the wild now

Shadow AI is the new stealer-log jackpot in 2026

How Large Language Models Are Reshaping Phishing

What LLMs change in the lure economy

What LLMs do not change

The measurable evidence

Personalised attacks that were not viable before

Defence, what works

What to tell users

The longer outlook

Related Posts

Vibe coding is shipping vulnerabilities at scale in 2026

Prompt injection left the lab in 2026. It is in the wild now

Shadow AI is the new stealer-log jackpot in 2026