Detecting AI-generated phishing in 2026: a header-forensics, classifier, and DKIM workflow

Updated May 2026, practitioner workflow.

The ratio has flipped. Through 2024 most phishing kits still relied on copy-paste templates and machine-translated lures; by mid-2026 the bulk of high-effort spear-phishing is generated by an LLM in the loop. Defenders looking for the old tells, rough grammar, generic salutations, awkward register-shifts mid-paragraph, have lost their highest-signal indicator. Worse, the new tells are non-obvious: the message reads cleaner than the one a colleague would actually write.

This is a working tutorial for analysts and SOC teams who need to triage suspect emails fast in 2026. We’ll layer three independent signals, header forensics, LLM-output classifiers, and DKIM/SPF replay analysis, to produce a defensible call on whether a message is human-authored, AI-assisted, or fully synthesised.

Why “vibes” don’t work anymore

The old triage heuristic, “does this feel like a real person wrote it?”, relied on a steady error rate in non-native phishing kits. Modern LLMs erase that error rate. They also erase the lower-bound register tells: a 2026 lure can match the corporate vocabulary of any vertical because the operator simply pastes three real LinkedIn posts from the target’s manager into the prompt and asks for a similar voice.

What hasn’t changed is the metadata. The headers, transport, and authentication trail of an email are far harder to fake than the body, and they tell you what an LLM cannot reach: the reputation, history, and authentication posture of the sending infrastructure.

Layer 1, Header forensics

Every email carries a “Received:” chain showing each hop from sender to recipient. In Gmail you reveal it via Show original; in Outlook via File → Properties → Internet headers; in Thunderbird via View → Message Source. Save the raw headers to a text file before doing anything else.

Open the file and walk it bottom-up. The first Received: line at the bottom is the originating server, the one your recipient first saw. Compare its IP and hostname to the apparent From: domain. If From: [email protected] but the bottom-most Received: is from mail-out-13.amazonaws.com with no SPF authorisation for that EC2 IP, that’s a hard fail.

Three header fields earn the most attention in 2026:

Authentication-Results: emitted by the receiving MTA. Look for spf=pass, dkim=pass, and dmarc=pass. Any fail or none for a brand-name impersonation attempt is itself probative.
Received-SPF: explicit SPF judgment. softfail is a yellow flag; fail is a red one when paired with brand impersonation.
X-Mailer / User-Agent: phishing kits often forget to strip these. X-Mailer: PHPMailer behind a From: claiming to be Microsoft is a classic kit fingerprint. Newer LLM-driven kits sometimes leave X-Originating-IP headers that don’t match the claimed sender.

For systematic triage, parse headers with a CLI tool. PhishTool, mailheader.org, and Microsoft’s free Message Header Analyzer all parse the chain into a sortable view. Save the parser output as evidence, it preserves the timing fingerprint better than a screenshot.

Layer 2, LLM-output classifiers

LLM-detection classifiers are imperfect, none of them claim better than ~85% under adversarial conditions in 2026, but as a third signal they’re useful. Run the body of the email (not the headers) through two of them in parallel and treat agreement as the signal:

GPTZero, academic-flavoured, surfaces “perplexity” and “burstiness” scores you can attach to your incident note.
Originality.AI, has an API, useful for batch-running an entire mailbox suspected of compromise.
Pangram, newer entrant; especially good at picking up Claude-family outputs that GPTZero misses.

The cardinal mistake is treating any single classifier’s verdict as ground truth. Treat agreement between two as a strong signal, single-classifier hits as suggestive, and silence as inconclusive. Pair the classifier output with header forensics, an email that scores 91% AI-generated and fails DKIM is a different incident from one that scores 91% AI-generated but authenticates cleanly from a real corporate mailbox.

The latter case, clean authentication, AI-generated body, is the canary for a different threat: account takeover. The attacker has access to a real mailbox (likely via stolen session cookies harvested by an infostealer; see our password-manager review for why session-replay is a 2026 problem) and is using an LLM to compose plausible follow-ups to an already-warm conversation. Header authentication will pass because it really is the legitimate sender.

Layer 3, DKIM and SPF replay analysis

Headers also reveal a class of attack the body cannot. DKIM-signed message replay is when an attacker captures a legitimately-signed email and re-sends it (or its body, repackaged) to new recipients. Because the signature is valid, naive filters approve. Two checks catch most replays:

Date drift: compare the Date: header to the Received: timestamps. A legitimate email’s date matches the first hop within a few minutes. A replayed message often shows a Date: hours or days earlier than the relay timestamp.
DKIM body-hash recomputation: tools like dkim-tools or the opendkim-testmsg utility recompute the body hash and tell you whether the body bytes still match the signed hash. Subtle modifications (one inserted link, one renamed attachment) break the hash even when the rest of the message is intact.

For a quick check on Linux:

# Verify DKIM body hash against the signed value
opendkim-testmsg < suspect-email.eml

# Show selectors and signing domain
sudo apt install -y dkimpy
dkimverify --signer suspect-email.eml

If the hash mismatches but the headers report DKIM=pass at the receiving MTA, you have a replayed-and-modified email. Quarantine it and pull the original from the suspected source mailbox to compare.

A 2026 triage flowchart

Three independent signal layers feed one verdict. Disagreement is itself information.

What “AI-assisted” means in incident notes

For incident-response writeups, distinguish between:

Human-authored phishing, older kits, machine-translated, often grammatically rough.
AI-assisted phishing, operator drafts the lure, an LLM polishes it. Most common 2026 case.
Fully synthesised, operator gives the LLM a target profile and the model produces the entire mail. Detected by classifier convergence and lack of historical thread context.
Account-takeover (ATO) impersonation, real mailbox compromise, AI used to compose plausible follow-ups inside an already-warm thread. The hardest to catch from headers alone, requires conversation-graph anomaly detection.

Each case implies a different downstream action. Synthesised lures need MTA reputation and DMARC tightening. ATO cases need session invalidation and a sweep of the compromised inbox for forwarded items. Don’t conflate them in the runbook.

Tooling stack, minimal 2026 setup

An EML parser in your incident-response toolkit. CLI: mhonarc, email Python module, or the eml-parser pip package.
A DKIM verifier: dkimpy or opendkim-tools on Linux.
API access to at least two LLM-detection classifiers. Don’t standardise on one, they fail in different ways.
A sandbox for any link or attachment you might detonate. See our malware sandbox tutorial.
A password manager with breach monitoring so any credentials seen in an ATO incident can be rotated immediately. Our picks.

A note on classifier failure modes

Three known false-positive patterns in 2026:

Translated text, non-native English run through DeepL or Google Translate often scores as AI-generated. Confirm with the headers (real corporate sender from a non-English-speaking country) before treating it as a hit.
Boilerplate corporate speak, auto-generated newsletters, contract templates, and legal disclaimers all read as “AI-generated” because they originally were written by templates.
Auto-generated alerts, internal notification systems, ticketing-system summaries, calendar invites with templated descriptions.

Always cross-reference the classifier output with the message’s transport metadata before escalating.

Detecting AI-generated phishing in 2026: a header-forensics, classifier, and DKIM workflow

wp2shell: pre-auth RCE in WordPress core (CVE-2026-63030)

GodDamn ransomware blinds EDR with a Microsoft-signed driver

Vibe coding is shipping vulnerabilities at scale in 2026

Detecting AI-generated phishing in 2026: a header-forensics, classifier, and DKIM workflow

Why “vibes” don’t work anymore

Layer 1, Header forensics

Layer 2, LLM-output classifiers

Layer 3, DKIM and SPF replay analysis

A 2026 triage flowchart

What “AI-assisted” means in incident notes

Tooling stack, minimal 2026 setup

A note on classifier failure modes

Further reading

Related Posts

wp2shell: pre-auth RCE in WordPress core (CVE-2026-63030)

GodDamn ransomware blinds EDR with a Microsoft-signed driver

Vibe coding is shipping vulnerabilities at scale in 2026