Project Glasswing and Claude Mythos: When AI Finds 10,000 Security Vulnerabilities
The Moment AI Security Research Became Undeniable
Earlier this year, Anthropic quietly published results from an internal effort called Project Glasswing — and the security community has been processing the implications ever since. The headline finding: a Claude-based system they internally called Claude Mythos autonomously identified more than 10,000 real, previously unknown security vulnerabilities across major open-source codebases, including OpenBSD and FreeBSD.
Ten thousand. Not hypothetical weaknesses or theoretical attack surfaces. Actual CVE-worthy bugs in production operating system code that has been reviewed by humans for decades.
This changes the calculus around AI-assisted security in ways that go well beyond the usual “AI can help write secure code” talking points.
What Project Glasswing Actually Did
Anthropic hasn’t published a full technical paper on Glasswing as of this writing, but the outline is clear from what’s been disclosed. The system used an extended-context Claude variant (Mythos) to:
- Ingest entire codebases — not snippets, but full repository trees with cross-file symbol resolution
- Reason about data flow across multiple files and subsystems simultaneously
- Generate targeted exploit hypotheses — not just “this looks suspicious” but coherent attack chains
- Triage findings by severity automatically, filtering noise before human review
The OpenBSD and FreeBSD findings are notable because both codebases are specifically designed for security. FreeBSD in particular powers a significant portion of the internet’s infrastructure (Sony PlayStation, Netflix CDN, WhatsApp). Finding fresh CVEs there is not a trivial benchmark.
Why This Matters More Than Previous AI Security Research
Previous AI security tools — code scanners, static analyzers, even earlier LLM-based linters — operated at the function or file level. They were good at catching the obvious: SQL injection sinks, unchecked printf format strings, off-by-ones in obvious loops.
What made Glasswing different appears to be multi-file, multi-step reasoning at scale. Many serious vulnerabilities live not in any single function but in the interaction between components written by different people at different times. A value validated in module A is trusted without re-validation in module C because both developers assumed the other was handling it. That kind of bug requires holding a large mental model of the whole system — exactly where LLMs with large context windows have a structural advantage over traditional static analysis.
Claude Fable 5, which offers a 1-million-token context window, sits in this same capability tier. A 1M context window is roughly 750,000 words — enough to load tens of thousands of lines of C source plus all its headers and keep everything in active “memory” during analysis.
The Dual-Use Question
It would be naive to write about this without addressing the obvious: if Claude can find 10,000 CVEs in open-source codebases, it can theoretically do the same for closed-source targets. Anthropic has been explicit about this tension — the Glasswing project was framed partly as an internal safety exercise to understand what the technology is capable of before it becomes widely available.
The security community broadly holds that defenders benefit more than attackers from AI vulnerability discovery, for a few structural reasons:
- Defenders need to find all the bugs; attackers only need one
- Defenders can run continuous automated scanning at scale; attackers work under time pressure
- CVE disclosure pipelines mean found bugs get fixed, shrinking the attack surface
But this asymmetry isn’t permanent or guaranteed. A well-resourced adversary running the same class of tool against a closed system doesn’t have to disclose anything. The responsible use question matters.
What This Means for Developers Today
You don’t need a Glasswing-scale operation to benefit from this direction. Practical takeaways for teams building software right now:
Use Claude for Code Security Review — Not Just Code Review
There’s a meaningful difference. Most developers use AI assistants to check code correctness, style, and test coverage. Security review requires a different prompt posture: asking not “does this work” but “how could this be abused?”
Effective prompts for security-focused review with Claude:
"Assume you're a malicious attacker. Identify every way input from an untrusted source reaches a privileged operation in this code.""Trace all paths where user-controlled data reaches file system or shell operations, even indirectly.""What assumptions does this authentication flow make that an adversary could violate?"
Embrace Cross-File Context
If you’re pasting individual functions into a chat window, you’re leaving most of the value on the table. Feed Claude entire modules, or entire subsystems, and ask it to reason about component interactions. The API — whether direct or via a gateway like AI Prime Tech — lets you do this programmatically with large context models.
Treat AI Findings as Triage, Not Ground Truth
Glasswing used human security engineers to validate AI-flagged findings before reporting CVEs. That step is not optional. AI models produce false positives, misunderstand security contexts, and occasionally hallucinate exploit chains that don’t work. The value is in the speed of the first pass, not in replacing expert judgment.
Run It on Your Dependencies Too
Your codebase is only as secure as its dependency tree. Submitting a suspect third-party library to Claude for a focused review — “look for authentication bypasses and unsafe deserialization” — is a legitimate and underutilized practice.
The Broader Shift in Security Tooling
Project Glasswing is probably not the last project of its kind. Competing labs are running similar internal research. Within two to three years, AI-powered continuous vulnerability scanning will likely be table stakes for any serious software organization — the way unit tests and SAST scanners are today.
The teams that build the muscle now — learning what prompts produce useful security analysis, how to integrate AI review into CI/CD pipelines, how to triage AI-flagged findings efficiently — will have a meaningful head start.
The interesting flip side: if AI tools become standard for finding bugs, codebases will need to be written with that in mind. More modular architectures, cleaner data flow documentation, explicit trust boundary annotations — these will become more valuable as they make AI analysis more tractable.
Takeaway
Project Glasswing and Claude Mythos represent a qualitative step change in automated security research — not an incremental improvement. Finding 10,000 CVEs in hardened OS codebases is not a demo; it’s a proof of capability. For developers, the immediate practical implication is simple: if you’re not using Claude for security-focused code review today, you’re leaving a powerful tool on the floor. Start with your most sensitive codebases, give it real cross-file context, and treat the output as a high-bandwidth first-pass triage layer. The human expert still closes the loop — but AI gets you there faster.
One API key for Claude Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5, plus GPT & Gemini — up to 80% off official pricing, pay-as-you-go.
Get Your API Key →