Close Menu
  • Home
  • News
  • Security
  • Privacy
  • Cybercrime
    • Threat Groups
    • Ransomware
    • Explainers
    • Stealer Logs
  • AI
  • OSINT
  • Tools
    • Ransomtracker
    • Stealercheck
  • Reviews
    • Best antivirus software for 2026: independent picks from Ransomnews
    • Best ransomware-resistant backup for 2026: cloud, hybrid, and immutable picks reviewed
    • Best ransomware protection for business 2026: ESET PROTECT and 5 alternatives reviewed
  • About Us
Facebook X (Twitter) Instagram Threads
Ransomnews
  • Home
  • News
  • Security
  • Privacy
  • Cybercrime
    • Threat Groups
    • Ransomware
    • Explainers
    • Stealer Logs
  • AI
  • OSINT
  • Tools
    • Ransomtracker
    • Stealercheck
  • Reviews
    • Best antivirus software for 2026: independent picks from Ransomnews
    • Best ransomware-resistant backup for 2026: cloud, hybrid, and immutable picks reviewed
    • Best ransomware protection for business 2026: ESET PROTECT and 5 alternatives reviewed
  • About Us
Facebook X (Twitter) LinkedIn
Ransomnews
AI

Model Theft and IP: What Happens When Your AI Gets Stolen

Martynas VareikisBy Martynas VareikisApril 26, 2026No Comments8 Mins Read28 Views
Share Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Neural network model with weight matrices being extracted by shadowy hand representing model theft
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

A frontier large language model represents tens to hundreds of millions of dollars of compute, terabytes of training data, and years of engineering refinement. The model weights are, in commercial terms, the most valuable asset of the company that produced them. They are also surprisingly hard to protect in practice. The threat of model theft has moved from theoretical to operational, with documented attempts and successful extractions across multiple categories of models.

The defensive landscape is uneven, and the trajectory of regulation and best practice is still being written.

The categories of model theft

Three distinct attack patterns exist:

Direct weight exfiltration. Theft of the actual model weights, a file or files representing the complete trained network. Requires access to the systems where the model is stored or served. Most damaging because it gives the attacker exactly what the original developer has.

Model extraction (also called model stealing). The attacker queries the deployed model through whatever API is available and uses the responses to train a substitute model that approximates the target’s behaviour. Does not give exact weights but produces a functional clone. Pioneered for image classifiers (Tramèr et al. 2016) and now well-developed for LLMs.

Distillation. A specific form of extraction where the attacker explicitly trains a smaller "student" model to imitate the larger "teacher" model. The technique is widely used legitimately (model compression) and increasingly used adversarially.

A fourth, lower-level concern: training-data extraction, where the attacker recovers individual training examples from the model. Demonstrated against GPT-2 (Carlini et al. 2021) and continues to be a research topic. Less about stealing the model than about extracting sensitive information embedded in it.

Documented incidents

The publicly known cases give a sense of the landscape:

LLaMA leak (March 2023). Meta’s first-generation LLaMA model was distributed under a research-only licence, requiring approval. Within a week of restricted release, the weights were leaked on 4chan and rapidly mirrored across torrent sites and Hugging Face. The leak was a classic insider exfiltration; Meta subsequently shifted to LLaMA 2 with a more permissive licence, partly in response.

Mistral 7B early leak (August 2023). The model was released openly shortly afterward, but pre-release weights had circulated.

Mosaic / DBRX, Falcon, and other open-weight model accidental early releases. Multiple incidents through 2023-2024 of models intended for staged release leaking earlier than planned.

Closed-API extraction research. Carlini et al. demonstrated in 2024 that ChatGPT’s embedding-vector API could be used to extract structural information about the model. OpenAI subsequently restricted the API. Subsequent academic work (Carlini et al., "Stealing Part of a Production Language Model," 2024) demonstrated extraction of the projection matrix from GPT-3.5 and similar models through carefully constructed queries.

Distillation of frontier models. The training of competitive open-weight models on datasets generated by closed models is widely understood to have happened across 2023-2024. Specific instances are hard to confirm because the practice is somewhere between "questionable" and "violating ToS" depending on the case. OpenAI’s terms of service prohibit using ChatGPT outputs to train competing models; enforcement is necessarily limited.

Insider-driven theft attempts. Several US companies have alleged in court filings that departing engineers exfiltrated model weights or training data. The Anthropic v. Anthropic-employee cases and similar provide some public detail, though most disputes are settled or sealed.

Why model weights are hard to protect

The structural difficulties:

The model is a file. Once an attacker has read access, they can copy it. Standard file-system access controls work, but in a complex training environment many people and many automated systems need read access at various points.

The model is queryable. Even without access to weights, the deployed model is a function the attacker can interact with. Any API exposes information; aggregating enough queries produces a functional clone.

The model can be embedded in other systems. Once integrated into a product, the weights may be distributed alongside the product to customer systems. Edge deployments and on-device inference particularly create distribution.

Traditional DRM does not work well. The model weights, by their nature, must be loaded into memory and computed against. Cryptographic protection of weights at rest works; protection during inference is much harder.

The provenance of derivative work is hard to prove. If a competitor releases a model that performs similarly to yours, proving they trained it on your output, distilled from your model, or copied your weights is technically difficult.

Defences against direct exfiltration

The most tractable category. Standard infosec hygiene:

Access controls on storage. Model weights stored with strong access controls; audit logging on every read; alerts on unusual access patterns.

Encryption at rest. Weights encrypted with keys that require active authentication to use. Particularly relevant for weights distributed in client devices or cloud appliances.

Watermarking. Embed identifying patterns in the model weights or behaviour that allow you to identify your model if it appears elsewhere. Active research area; multiple techniques (parameter-level watermarks, behaviour-level watermarks, training-data watermarks). Some are robust to fine-tuning; some are not.

Insider threat programs. The plurality of confirmed model-theft incidents involve insiders. Standard insider-threat tooling (DLP, behavioural monitoring, exit interviews) applies.

Air-gap and physical security for high-value training environments. The frontier-model labs (Anthropic, OpenAI, Google DeepMind, Meta FAIR, Microsoft Research) maintain secure compute environments with substantial physical and procedural protections. Smaller organisations training competitive models often have weaker controls.

Defences against extraction and distillation

Harder. The defences are about raising attacker cost, not preventing extraction:

Rate limiting and query monitoring. Detect query patterns characteristic of extraction attempts (large query volumes, systematic exploration of input space, queries from known research groups).

Output perturbation. Add noise to model outputs to make extraction harder. Trade-off: degrades legitimate use.

Watermarking outputs. Embed statistical signatures in generated text that allow you to detect outputs used to train derivative models. Kirchenbauer et al.’s "A Watermark for Large Language Models" (2023) at arxiv.org/abs/2301.10226 is the foundational paper. OpenAI and Google have publicly discussed deploying watermarks; effectiveness varies.

Access control on high-value APIs. Embedding APIs, fine-tuning APIs, and other endpoints that disclose more model information are more tightly controlled than completion APIs.

Legal protection through ToS. The contracts used by major LLM API providers explicitly prohibit using outputs to train competing models. Enforcement is partial but real.

Defences against training-data extraction

A separate concern. The risk: a deployed model leaks individual training examples, including sensitive personal data, copyrighted text, or proprietary information.

Differential privacy in training. Adds noise during training to bound information leakage. Costly but produces provable guarantees. Production deployments are limited.

Data filtering. Remove sensitive examples from training data before training. Standard practice; never complete.

Output filtering. Detect and refuse outputs that match training-data segments verbatim. Carlini-style extraction attacks specifically circumvent this, but it raises the bar.

Membership inference defences. Adversarial training against attackers attempting to determine whether a specific example was in the training set.

The IP and legal landscape

Several developments through 2024-2025 matter:

Trade-secret protection of model weights is increasingly tested in court. The Anthropic and OpenAI insider-departure cases test the boundaries.

Patent protection of model architectures and training techniques is an active area.

Copyright treatment of model weights is unsettled. The Sebastian Bach / public-domain-output cases and the broader "AI output copyright" debates frame parts of this.

Trade-secret-style protection appears to be the most operationally effective. Patents have not been heavily asserted yet.

The EU AI Act and related regulations require certain transparency about training data and capabilities; the conflict between transparency and IP protection is a central tension.

What organisations should do

For frontier-model developers:

Treat model weights as the highest-tier intellectual property. Access controls equivalent to source code or financial systems.

Implement watermarking for both weights (where feasible) and outputs.

Monitor for extraction attempts at the API layer.

Maintain forensic visibility, what model checkpoints exist, where they are stored, who has accessed them, when.

Track legal protections in your jurisdiction. Trade-secret status often requires demonstrable protective measures.

For organisations deploying AI:

Understand the licence terms of the models you use. Open-weight, research-only, commercial-restricted, fully open, each has different implications.

Treat fine-tuned model weights as containing potentially sensitive information from your fine-tuning data. Protect accordingly.

Audit access to inference endpoints; rate-limit aggressively if extraction is a concern.

Recognise the risk of building business-critical systems on closed APIs that may change unpredictably; weight-availability planning is part of architecture.

The deeper observation: models are software, weights are software artifacts, and they require the same supply-chain and access-control discipline as any other valuable software asset. The treatment will mature; the risks are present today.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Previous ArticleAdversarial Examples: Tricking ML Models with Imperceptible Changes
Next Article Open-Source Models vs Closed APIs: A Security Comparison
Martynas Vareikis

Martynas Vareikis is the AI Editor at Ransomnews. He covers the intersection of artificial intelligence and information security — from machine-learning models in defensive tooling to the adversarial use of LLMs by ransomware operators, deepfake-driven social engineering, and the rise of agentic threats. His reporting focuses on translating fast-moving AI research into practical guidance for defenders, journalists, and the broader security community. Reach Martynas via [email protected].

Related Posts

Ransomware ditched encryption in May 2026 — here’s why

May 22, 2026

Ransomware leak-site OSINT: 2026 investigation walkthrough

May 16, 2026

Prompt injection: the 2026 LLM defender’s playbook

May 16, 2026

Comments are closed.

Facebook X (Twitter) LinkedIn
© 2026 Ransomnews.com

Type above and press Enter to search. Press Esc to cancel.

Cookies on Ransomnews

We use strictly-necessary cookies to run the site and may use first-party analytics to understand which articles are read. Some pages contain affiliate links — when you click one, the affiliate network sets cookies on the merchant's domain to attribute the referral. See the Cookie Policy and Affiliate Disclosure for detail.

RANSOMNEWS.COM

Tracking the criminal infrastructure of the internet.

Independent coverage of ransomware, breach economics, threat actors, privacy, AI security, and the open-source investigation toolkit.

// Topics

  • News
  • Security
  • Privacy
  • Cybercrime
  • AI
  • OSINT
  • Reviews
  • Threat Groups
  • Stealer Logs
  • Ransomtracker
  • Stealercheck

// Site

  • About Us
  • Editorial Team
  • Contact
  • Tip Line
  • Editorial

// Legal

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • Affiliate Disclosure
  • RSS Feed
© 2026 Ransomnews.com · Tracking the criminal infrastructure of the internet.