Claude Mythos: Benchmarks, Features, and Breakthroughs

Q: Will Mythos capabilities ever reach general availability?

Anthropic has indicated that Mythos-class capabilities will eventually be integrated into a future Opus model, but only after new safeguards are developed. No timeline has been announced. The bottleneck isn't capability, it's safety.

On April 7, 2026, Anthropic released Claude Mythos Preview, a frontier model that scores 93.9% on SWE-bench Verified (Anthropic, 2026). It's not available to the public. Mythos exists for one purpose: finding and fixing security vulnerabilities at scale through a coalition called Project Glasswing. Here's what developers need to know about the model, the benchmarks, and what it means for the broader Claude lineup.

What Is Claude Mythos?

Claude Mythos is Anthropic's most capable model, sitting above Opus in their hierarchy and scoring 77.8% on SWE-bench Pro compared to Opus 4.6's 53.4% (Anthropic, 2026). It's a research preview, not a product. Anthropic has stated explicitly that there are no plans for general availability.

The name comes from the Ancient Greek word for "utterance" or "narrative." That's a fitting choice for a model whose primary job is to tell a story about code, specifically where that code is broken and exploitable.

Why It Won't Be Publicly Released

Mythos can surpass "all but the most skilled humans" at finding and exploiting software vulnerabilities (Anthropic, 2026). That capability makes it extraordinarily useful for defense. It also makes it dangerous in the wrong hands.

Anthropic's position is clear: the offensive potential outweighs the benefits of broad access. Instead of releasing it, they've built a controlled deployment model through Project Glasswing, restricting usage to vetted security partners.

This is a notable departure from the typical AI release playbook. Most labs race to ship capabilities broadly. Anthropic is arguing that some capabilities shouldn't be shipped at all, at least not without new safeguards in place.

What About Future Access?

Anthropic has signaled that "Mythos-class" capabilities will eventually make their way into a future Opus model. The caveat: new safeguards need to be developed first. No timeline has been given.

Neural network visualization inspired by deep learning AI technology

Project Glasswing and the Coalition

Project Glasswing is a defensive cybersecurity coalition of over 40 organizations, backed by up to $100M in committed usage credits (Anthropic, 2026). It launched alongside Mythos on April 7, 2026, and represents the largest coordinated AI-for-security effort to date.

The named partners include some of the biggest names in tech and security:

Cloud and platform: AWS, Google, Microsoft
Hardware: Apple, Broadcom, NVIDIA
Security: Cisco, CrowdStrike, Palo Alto Networks
Finance: JPMorganChase
Open source: Linux Foundation

That's not a marketing coalition. When Apple, Google, and Microsoft are sitting at the same table for a security initiative, the threat model they're responding to is real.

Open Source Funding

Anthropic committed $4M to open source security organizations. That breaks down to $2.5M for Alpha-Omega/OpenSSF and $1.5M for the Apache Software Foundation (Anthropic, 2026). Many of the vulnerabilities Mythos finds live in open source codebases that lack dedicated security funding.

Benchmark Performance

Mythos scores 93.9% on SWE-bench Verified and 82.0% on Terminal-Bench 2.0, outperforming Opus 4.6 by double-digit margins across most coding benchmarks (Anthropic, 2026). The gains aren't incremental. They're generational.

Coding and Engineering Benchmarks

| Benchmark | Mythos | Opus 4.6 | Difference | |---|---|---|---| | SWE-bench Verified | 93.9% | 80.8% | +13.1pp | | SWE-bench Pro | 77.8% | 53.4% | +24.4pp | | SWE-bench Multilingual | 87.3% | 77.8% | +9.5pp | | SWE-bench Multimodal | 59.0% | 27.1% | +31.9pp | | Terminal-Bench 2.0 | 82.0% | 65.4% | +16.6pp | | OSWorld-Verified | 79.6% | 72.7% | +6.9pp |

The SWE-bench Multimodal number jumps out. A 31.9 percentage point improvement suggests Mythos has a fundamentally different approach to multimodal code reasoning, not just better pattern matching.

Reasoning and Knowledge Benchmarks

| Benchmark | Mythos | Opus 4.6 | Difference | |---|---|---|---| | GPQA Diamond | 94.6% | 91.3% | +3.3pp | | Humanity's Last Exam (no tools) | 56.8% | 40.0% | +16.8pp | | Humanity's Last Exam (with tools) | 64.7% | 53.1% | +11.6pp | | CyberGym | 83.1% | 66.6% | +16.5pp | | BrowseComp | 86.9% | 83.7% | +3.2pp |

On BrowseComp, Mythos achieved a higher score while using 4.9x fewer tokens (Anthropic, 2026). That's not just smarter. It's more efficient.

The CyberGym benchmark, purpose-built for cybersecurity tasks, shows a 16.5 percentage point gap. This aligns with Mythos being specifically optimized for vulnerability discovery rather than general-purpose improvements.

Cybersecurity Capabilities

Mythos has autonomously discovered thousands of zero-day vulnerabilities across every major operating system and web browser, without human steering (Anthropic, 2026). That's not a typo. The model finds exploitable bugs on its own.

Notable Discoveries

Two findings stand out for their sheer longevity:

A 27-year-old vulnerability in OpenBSD, one of the most security-focused operating systems in existence. OpenBSD's codebase has been audited repeatedly by expert humans. Mythos found what they missed.
A 16-year-old vulnerability in FFmpeg where the affected line of code had been hit 5 million times by automated security tools (Anthropic, 2026). Five million passes. Every one of them missed it.

This demonstrates something qualitatively different from existing static analysis or fuzzing tools. Those tools hit the FFmpeg line millions of times and saw nothing. Mythos understood the context around the code well enough to recognize the flaw.

Autonomous Exploit Chaining

Mythos doesn't just find individual bugs. It autonomously chained multiple Linux kernel vulnerabilities together to achieve privilege escalation (Anthropic, 2026). That's a capability previously associated with elite human security researchers and nation-state actors, not AI models.

For developers who've worked with existing AI-powered code scanning tools like CodeQL or Semgrep, the difference here is fundamental. Those tools work from rules and patterns. Mythos appears to reason about code semantics, which is why it catches bugs that pattern-based tools cannot.

3D rendered digital brain concept illustrating machine learning and neuroscience

Pricing and Access

Mythos is priced at $25 per million input tokens and $125 per million output tokens, approximately 5x the cost of Opus (Anthropic, 2026). But pricing is academic for most developers since access is restricted to Glasswing partners.

Claude Model Pricing Comparison

| Model | Input (per MTok) | Output (per MTok) | Access | |---|---|---|---| | Claude Mythos | $25.00 | $125.00 | Invitation-only (Glasswing) | | Claude Opus 4.7 | $5.00 | $25.00 | API, Pro, Team, Enterprise | | Claude Sonnet 4.6 | $3.00 | $15.00 | API, Pro, Team, Enterprise | | Claude Haiku 4.5 | $1.00 | $5.00 | API, Pro, Team, Enterprise |

The Glasswing coalition has committed up to $100M in usage credits across partners (Anthropic, 2026). At Mythos pricing, that buys roughly 800 billion output tokens. For a consortium of 40+ organizations scanning massive codebases, that budget will go fast.

Mythos vs Other Claude Models

Mythos outperforms Opus 4.6 by 13.1 percentage points on SWE-bench Verified, but the models serve fundamentally different purposes (Anthropic, 2026). Opus is your general-purpose powerhouse. Mythos is a restricted specialist.

The Practical Difference

For day-to-day development work, nothing changes. You can't use Mythos. Sonnet and Opus 4.7 remain the models you'll build with. The interesting question is what happens when "Mythos-class" capabilities migrate into a future Opus release with appropriate safeguards.

That migration could mean:

SWE-bench Verified scores above 90% in a generally available model
Dramatically better multimodal code understanding (+31.9pp improvement)
More efficient inference (4.9x fewer tokens on BrowseComp)

The real competitive implication isn't Mythos itself. It's that Anthropic has demonstrated capability levels that competitors haven't matched, and they chose not to ship it broadly. When those capabilities do reach general availability, Anthropic will have a meaningful head start on safety testing at that performance tier.

What This Means for Developers

For most developers, Mythos changes nothing today. You can't access it. But the 93.9% SWE-bench Verified score signals where AI-assisted coding is heading (Anthropic, 2026), and that trajectory matters for how you plan your tooling.

Short-Term Implications

If you're a security researcher or work at one of the Glasswing partner organizations, you may get access to Mythos for vulnerability discovery. For everyone else, the takeaway is simpler: keep building with Opus and Sonnet. They're still the best generally available models for coding tasks.

Medium-Term Implications

When Mythos-class capabilities arrive in a future Opus model, expect to revisit your assumptions about what AI can handle in your codebase. A model that autonomously chains kernel vulnerabilities can probably handle your microservices refactoring.

The $4M in OSS security funding also matters. If you maintain or depend on open source projects, Glasswing's vulnerability scanning could surface bugs in your dependency tree before attackers find them. That's a concrete benefit even without direct Mythos access.

Frequently Asked Questions

Can I use Claude Mythos through the API?

No. Claude Mythos is restricted to Project Glasswing partners and is not available through Anthropic's public API, Claude Pro, or any other consumer channel. Anthropic has explicitly stated there are no plans for general availability due to the model's offensive cybersecurity capabilities (Anthropic, 2026).

How is Mythos different from Claude Opus?

Mythos sits above Opus in Anthropic's model hierarchy. It scores 93.9% on SWE-bench Verified compared to Opus 4.6's 80.8%, with especially large gains on multimodal and cybersecurity benchmarks (Anthropic, 2026). The core difference is purpose: Opus is general-purpose, Mythos is restricted to defensive security work.

Will Mythos capabilities ever reach general availability?

Anthropic has indicated that "Mythos-class" capabilities will eventually be integrated into a future Opus model, but only after new safeguards are developed (Anthropic, 2026). No timeline has been announced. The bottleneck isn't capability, it's safety.

What is Project Glasswing?

Project Glasswing is a defensive cybersecurity coalition launched April 7, 2026, pairing Claude Mythos with over 40 organizations including AWS, Apple, Google, Microsoft, and CrowdStrike. The coalition has committed up to $100M in usage credits and $4M in donations to open source security foundations (Anthropic, 2026).

All statistics and claims in this article are sourced from Anthropic's Project Glasswing announcement, published April 7, 2026.