Project Glasswing: AI Finds Decades-Old Zero-Days

Project Glasswing illustration: Claude Mythos Preview discovering decades-old vulnerabilities across operating systems and browsers

On April 7, 2026, Anthropic announced Project Glasswing and disclosed the capabilities of Claude Mythos Preview, an unreleased frontier model that has autonomously discovered thousands of zero-day vulnerabilities across every major operating system and web browser. The list of disclosed findings includes a 27-year-old bug in OpenBSD, a 17-year-old unauthenticated remote root in FreeBSD (CVE-2026-4747), and a 16-year-old out-of-bounds write in FFmpeg that had been hit more than 5 million times by automated fuzzing tools without ever being caught.

The model will not be released publicly. Instead, Anthropic is routing it through a coalition of 12 founding partners (AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks, and Anthropic itself) plus more than 40 additional critical-infrastructure organizations and open-source maintainers. For MSPs and anyone who runs software that matters, the signal is straightforward: the economics of vulnerability discovery just changed, and not in the defenders’ favor.

The model and the jump from Opus 4.6

Mythos Preview is a general-purpose Claude model. Anthropic is explicit in its own write-up that the security capabilities were not the result of targeted training. They “emerged as a downstream consequence of general improvements in code, reasoning, and autonomy.” That framing matters. If exploit-class capabilities fall out of generic capability improvements, they will keep reappearing in every frontier lab’s next model.

Before looking at the security numbers, look at the general ones. The 244-page Mythos Preview system card reports a double-digit jump over Opus 4.6 on almost every coding, reasoning, and computer-use benchmark Anthropic publishes.

These are coding, reasoning, and broad-knowledge benchmarks, not security-specific ones. The security headlines sit on top of a model that is meaningfully better at almost everything.

The jump from Claude Opus 4.6 (released only a few months earlier) to Mythos Preview is not incremental. On the security benchmarks specifically, it is closer to two orders of magnitude.

Firefox JavaScript engine: working exploits built from vulnerabilities

181

Opus 4.6

Mythos Preview

Successful exploit runs across several hundred attempts. Source: Anthropic red team.

On Firefox’s JavaScript engine, Opus 4.6 turned vulnerabilities into working exploits 2 times across hundreds of attempts. Mythos Preview: 181 working exploits, plus an additional 29 runs where the model achieved register control but stopped short of full weaponization. On OSS-Fuzz targets, Mythos produced 595 crashes at tier 1-2 severity and 10 full tier-5 control-flow hijacks, where Opus 4.6 had managed only isolated tier-3 crashes.

The 89% figure is the one that should make every security engineer stop and reread it: when Anthropic put 198 of the model’s vulnerability reports in front of human validators, 89% matched the model’s own severity assessment exactly, and 98% were within one severity level. Mythos is not just finding bugs. It is triaging them accurately enough that human analysts agree with its calls at a rate most junior pentesters would not hit.

What it actually found

A non-exhaustive sample of publicly disclosed findings from the Anthropic post. Less than 1% of the total discoveries have been patched so far, so the rest is still inside coordinated disclosure windows.

A 27-year-old kernel crash in OpenBSD. The TCP SACK implementation contained a signed integer overflow that let any remote attacker crash any OpenBSD machine simply by connecting to it. OpenBSD is the operating system with the most aggressive security posture in mainstream use. The bug survived nearly three decades of audit.

CVE-2026-4747: unauthenticated remote root on FreeBSD via NFS. Mythos autonomously constructed a 20-gadget ROP chain that escalates to full root without any authentication. Building ROP chains of that length was, until this announcement, near the upper bound of what elite human exploit developers did by hand.

A 16-year-old H.264 slice sentinel flaw in FFmpeg. FFmpeg sits inside practically every video pipeline on the internet. The vulnerable code had been covered by continuous fuzzing for years and had been hit more than 5 million times without the existing tools ever classifying it as a flaw. The vulnerability has existed since 2003 and, per Anthropic’s research, was already weaponized by threat actors in 2010. It just wasn’t found by defenders until now.

Critical authentication bypass in Botan. Botan is a widely used C++ cryptography library. Mythos found a cryptographic authentication bypass that was disclosed on April 7.

Linux kernel privilege escalation chain. The model autonomously identified and chained multiple kernel vulnerabilities to move from a regular unprivileged user to complete machine control. No human guidance in the chain construction.

A 4-bug browser exploit. Mythos built a JIT heap spray that combined four separate vulnerabilities and escaped both the browser’s renderer sandbox and the underlying OS sandbox in one chain.

The economics have changed

The costs that Anthropic published get less attention than the vulnerability list, but they are arguably the most important numbers in the entire announcement.

OpenBSD vulnerability discovery: under $50 per successful run. A decades-old kernel bug in the most hardened OS in the world, for the price of a dinner.
FFmpeg exploitation: roughly $10,000 across several hundred runs. Hit rate high enough to amortize well.
Linux kernel exploit development: under $2,000. Less than a mid-range laptop.

Those figures are total run costs, not token rates. Anthropic has disclosed that the post-preview price for Mythos-class access will be $25 per million input tokens and $125 per million output tokens, roughly an order of magnitude above current frontier-model rates. The cost per exploit still lands where it does because a single run does the work of a team of specialists, not because the model itself is cheap.

Until yesterday, finding and weaponizing a kernel-level zero-day was the work of elite state-level teams or private offensive-security shops charging six or seven figures per bug. The Anthropic numbers compress that by two to four orders of magnitude. And unlike a human team, the model does not sleep, does not get bored, and does not specialize in a single OS family.

Security researcher Thomas Ptacek, writing a week before the Anthropic announcement, put a name to what was changing. In his essay Vulnerability Research Is Cooked, he argued that software has been shielded from exploits “not only by soundly engineered countermeasures but also by a scarcity of elite attention.” Finding a bug in a modern browser requires a researcher who understands both security and the archaic internals of whatever subsystem the bug lives in: font rendering, the JIT, the networking stack, the memory allocator. That combination of deep skills used to exist in a tiny number of people. Ptacek titled his next section “The New Price Of Elite Attention: ε.” The Anthropic cost figures are what ε looks like in dollars.

The defender-side framing in the Glasswing announcement is blunt: the window between a vulnerability being discovered and being exploited by an adversary has collapsed. What used to take months can now happen in minutes.

Project Glasswing and the coalition

Project Glasswing is the operational response. Rather than ship Mythos Preview to the public, Anthropic is distributing access to organizations that can use it to harden widely deployed software before the same capabilities appear in the hands of adversaries.

The commitments announced on April 7:

Up to $100 million in Claude Mythos Preview usage credits for coalition partners and approved critical-infrastructure organizations.
$2.5 million to Alpha-Omega and OpenSSF via the Linux Foundation.
$1.5 million to the Apache Software Foundation.
More than 40 additional organizations that maintain critical software have been granted access.
Open-source maintainers can apply for access through the Claude for Open Source program.
A 90-day disclosure window with an optional 45-day extension for maintainers who need more time. SHA-3 hash commitments are published up front so disclosure claims can be independently verified after the embargo.
Human validators triage every reported issue before it reaches maintainers, matching Mythos on severity 89% of the time and within one level 98% of the time.
A full public report on findings, patched vulnerabilities, and operational lessons is promised within 90 days.

Anthropic has said Mythos Preview itself will not become generally available. The implication buried in the announcement: future Opus releases will eventually carry “Mythos-class” capabilities but only with additional safeguards in place.

What this means for MSPs and their clients

Two forces are now pulling in opposite directions.

Offensive risk has increased. If Anthropic’s model can do this, similar capabilities will surface at other frontier labs within months, and eventually in the hands of criminal actors. Attacks will be more frequent, faster, and more sophisticated. The clients most exposed are exactly the ones MSPs spend the most time defending: manufacturing, construction, finance, and any organization running older, unpatched software on legacy operating systems and ICS equipment. Those are precisely the targets a model like Mythos finds first.

Defensive capability is also increasing. The security vendors most MSPs already depend on (CrowdStrike, Palo Alto Networks, Microsoft) are all in the Glasswing coalition. Expect AI-augmented vulnerability discovery and response to land in their products over the next 6 to 12 months. The gap will not close for every client equally, though. Organizations on older product tiers, or those who skipped the XDR and managed-response upgrades, will not automatically benefit.

Concrete things worth doing right now:

Aggressive patch management. If a 27-year-old bug is findable by AI in a single overnight run, there is no longer a defensible reason to run unpatched software. Critical patches should be applied in hours, not weeks.
Legacy software audit. Any system that no longer receives security updates has moved from “technical debt” to “acute risk.” Inventory what each client runs, and put every end-of-life OS and library on an exit plan.
Stricter network segmentation. If exploits become easier to produce, lateral movement has to be harder to do. VLANs, firewalls, and zero-trust rules that were “on the roadmap” need to land on the next change window.
Continuous monitoring. SIEM, EDR, and real-time alerting close the gap between initial access and detection. With a shrinking exploit window, detection time is the difference between an incident and a breach.

The elephant in the room

Anthropic announced all of this one week after their own security incident. A packaging error in Claude Code version 2.1.88 exposed nearly 2,000 source files and over 512,000 lines of unobfuscated TypeScript. Then Anthropic’s DMCA takedown notices accidentally hit roughly 8,100 GitHub repositories before the bulk of the takedown was retracted.

Not a minor detail. When you are the company saying your model is too dangerous to release publicly, you need to have your own house in order. The irony isn’t lost on anyone.

But it doesn’t change the core issue. The capabilities are real. The bugs found are real. And the trend is clear: AI models are getting better at finding and exploiting vulnerabilities, and that is not going to stop.

Key takeaways

Axios npm Supply Chain Attack: What You Need to Know

Analysis of the axios npm supply chain attack that dropped a cross-platform RAT via maintainer account compromise, with IOCs and defensive steps.

LiteLLM Supply Chain Attack: What MSPs Need to Know

Analysis of the TeamPCP supply chain attack on LiteLLM via compromised Trivy GitHub Actions, covering the 3-layer payload, IOCs, and defensive actions for MSPs.

Huntress 2026 Cyber Threat Report: Key Findings for MSPs

Analysis of the Huntress 2026 Cyber Threat Report covering identity compromise, RMM abuse, ClickFix loaders, ransomware timelines, and a 30-day action plan.

Project Glasswing: AI Finds Decades-Old Zero-Days

The model and the jump from Opus 4.6

SWE-Bench Pro

Terminal-Bench 2.0

SWE-Bench Verified

SWE-Bench Multilingual

SWE-Bench Multimodal

GPQA Diamond

Humanity's Last Exam (no tools)

Humanity's Last Exam (with tools)

BrowseComp

OSWorld-Verified

Firefox JavaScript engine: working exploits built from vulnerabilities

What it actually found

The economics have changed

Project Glasswing and the coalition

What this means for MSPs and their clients

The elephant in the room

Key takeaways

Project Glasswing: AI Finds Decades-Old Zero-Days

The model and the jump from Opus 4.6

SWE-Bench Pro

Terminal-Bench 2.0

SWE-Bench Verified

SWE-Bench Multilingual

SWE-Bench Multimodal

GPQA Diamond

Humanity's Last Exam (no tools)

Humanity's Last Exam (with tools)

BrowseComp

OSWorld-Verified

Firefox JavaScript engine: working exploits built from vulnerabilities

What it actually found

The economics have changed

Project Glasswing and the coalition

What this means for MSPs and their clients

The elephant in the room

Key takeaways

Related reading