Anthropic Accuses DeepSeek of AI Model Theft via Distillation Attacks

ITmatters Tuesday February 24, 2026

Key Highlights

Anthropic accuses China-based AI firms DeepSeek, Moonshot, and MiniMax of unauthorized distillation attacks on Claude models.
The company detected industrial-scale data extraction campaigns using fraudulent API accounts.
New detection and fingerprinting systems have been deployed to identify and block such behavior.
Anthropic is sharing technical indicators with other AI labs and regulators to strengthen AI security collaboration.

U.S.-based artificial intelligence company Anthropic has accused several Chinese AI firms—DeepSeek, Moonshot, and MiniMax—of attempting model distillation attacks on its flagship system, Claude. The alleged campaigns involved large-scale data extraction designed to mimic Claude’s reasoning and behavior without permission.

According to Anthropic, investigators identified millions of coordinated API queries aimed at replicating Claude’s capabilities in reasoning, agentic coding, and tool orchestration. The company has since launched advanced countermeasures and shared technical indicators across the AI ecosystem.

What Is a Distillation Attack

Distillation, a familiar concept in machine learning, involves training a smaller “student” model to reproduce the output patterns of a larger “teacher” model. It helps optimize performance for lightweight deployment. But when used without consent, it can turn into intellectual property theft.

Anthropic explained that unauthorized distillation effectively allows competitors to extract proprietary safety mechanisms, performance optimizations, and alignment tuning from frontier AI systems—without investing in equivalent research or computing resources. In essence, distillation attacks blur the line between innovation and imitation.

Anthropic’s Allegations Against DeepSeek

Anthropic said DeepSeek conducted more than 150,000 targeted exchanges with Claude, focusing on reasoning and reward modeling. These exchanges spanned tasks such as rubric-based grading and politically sensitive query rephrasing, suggesting an effort to reconstruct Claude’s decision-making and censorship-avoidance patterns.

The company reports that DeepSeek coordinated parallel traffic from multiple synchronized accounts, all displaying identical usage patterns and payment details—a clear indicator of automation. Metadata reportedly traced the activity back to researchers at DeepSeek’s lab, confirming intent rather than coincidence.

Moonshot and MiniMax Also Named

The alleged distillation efforts were not confined to one lab.
Anthropic accused Moonshot AI of executing more than 3.4 million interactions related to complex reasoning, coding, and computer vision tasks. This included hundreds of fraudulent accounts designed to mask traffic coordination.

Meanwhile, MiniMax reportedly launched a massive 13 million–query operation, primarily targeting Claude’s agentic coding and tool orchestration features. Anthropic detected this effort while still active, before MiniMax’s in-training model reached release, enabling the company to halt data siphoning in progress.

How Anthropic Detected the Attacks

Anthropic leveraged behavioral analytics, metadata tracking, and response pattern mapping to distinguish normal user activity from systematic extraction patterns.
Suspicious indicators included:

High-frequency, template-based questioning
Repetitive instruction formats across multiple accounts
Shared payment sources pointing to lab coordination
Load-balanced querying timed for consistent throughput

The company’s detection systems flagged traffic anomalies consistent with distillation at scale, ultimately allowing attribution to distinct research networks.

Anthropic’s Response: New Defenses and Collaboration

Anthropic said it is now investing heavily in anti-distillation infrastructure, deploying classifiers and fingerprinting tools across its cloud interfaces. The goal is to make such attacks harder to execute and easier to detect while preserving customer experience.

The firm also announced collaboration with trusted AI labs, cloud providers, and regulatory bodies to identify coordinated extraction behaviors.
“We’re strengthening access controls for educational users, startups, and security research accounts—these pathways were repeatedly exploited,” the company noted.

On the model level, Anthropic is introducing output-level distortions and trace metadata that make its responses less effective for mimicking through student training pipelines. This prevents misuse while maintaining transparency for legitimate researchers.

Industry Implications

The accusations underscore a growing challenge in the AI arms race: how to protect advanced systems from imitation at scale. As AI firms invest billions in developing frontier models, distillation attacks threaten competitive integrity and raise global concerns about IP enforcement across borders.

Experts believe Anthropic’s move to publicize technical evidence may influence future AI regulation and data protection norms, especially between U.S. and China–based labs operating in overlapping domains.

In Conclusion

With Anthropic sounding alarms about unauthorized model distillation, the spotlight turns again to AI ethics, cybersecurity, and data sovereignty. While cooperation among labs is increasing, the risks of model cloning remain significant.

For Anthropic, the Claude system represents not just technological progress but a test case for protecting proprietary intelligence in a rapidly evolving AI landscape.

0 Likes

Author Profile

ITmatters