Shadow IT used to mean unsanctioned SaaS spend. Shadow AI is the same problem with a faster blast radius. Your developers are pasting proprietary code into ChatGPT to debug it. Your sales reps are summarising customer calls in Otter or Fireflies, including pricing details and strategy, and sometimes those summaries get retained for model training. Your legal team is feeding contract redlines to AI assistants whose data-retention policy nobody on the legal team has actually read.
The leakage is real, ongoing, and most security teams are flying blind on it. Here’s how to find it and what to do about it.
Where the data actually goes
Three buckets. Consumer-tier accounts on ChatGPT, Claude, Gemini, Perplexity, and friends, most of these still default to using submitted data for model improvement unless the user actively opts out. Enterprise-tier accounts at the same vendors don’t train on data, but employees often have personal accounts on the same browser. Long-tail AI tools, every Chrome extension, every “AI for X” SaaS, every specialty assistant, most of which have data-handling policies that range from acceptable to actively bad.
Once data leaves your perimeter into one of these surfaces, you have effectively zero ability to recall it. The only winning move is not letting it leave in the first place, which requires knowing it’s leaving in the first place.
How to actually measure shadow AI usage
The cleanest signal is DNS and proxy logs. AI tool domains are well-documented (chat.openai.com, claude.ai, gemini.google.com, perplexity.ai, copilot.microsoft.com, plus a long tail of specialty tools). A simple report: how many users hit each domain over the past month, segmented by department. Most companies are surprised by both the volume and the spread across departments that “don’t use AI.”
Network proxies and CASB tools (Netskope, Zscaler, Cisco Umbrella, even Cloudflare Zero Trust) can extend that into request-level visibility, what file types are being uploaded, what byte volume per session, which destinations are seeing the most traffic. The more granular the data, the better the conversation with leadership.
The three controls that actually work
1. Provide an approved alternative. Banning AI is not a strategy. Block consumer ChatGPT and tell developers to use Copilot Business or an enterprise-tier ChatGPT account, and the leakage drops 80% overnight. Without an approved tool, employees route around the block to personal devices, which is worse.
2. DLP at the upload boundary. Modern DLP can inspect outbound HTTP traffic for sensitive content patterns (source code, PII, financial data) and block uploads to known AI domains. The false-positive rate is real but tunable. Even a noisy first deployment is better than nothing.
3. A short, clear policy. Three pages, written in plain language. What can go into approved tools (general questions, public information, sanitised work). What can’t (customer data, source code containing IP, financial details, anything covered by a confidentiality agreement). What happens if you violate (training, then escalation). Most employees comply when the rules are obvious.
The audit you should run this week
Pull thirty days of DNS logs for AI tool domains. Sort by volume per user. Identify the top fifty users by query volume. Sample the actual proxy traffic for ten of them, what file types are they uploading, what’s the byte volume? Forty per cent of the time, you’ll find at least one case where someone in finance, legal, or engineering is uploading what is clearly sensitive material to a consumer-tier AI account.
That finding is your business case for the controls above. Without that finding, the conversation with leadership is hypothetical. With it, the conversation is concrete and the budget conversation is short.
Bottom line
Shadow AI isn’t a future problem. It’s a present problem with measurement gaps. Visibility first, alternatives second, controls third. In that order, in the next month, the gap is closeable.
