Claude Mythos Zero-Days Explained: What the Model Found and How

Two findings stand out in Anthropic's published examples of what Claude Mythos Preview has discovered. The first is a 27-year-old integer overflow vulnerability in OpenBSD, an operating system whose entire reputation rests on being one of the most security-audited codebases in the world. The second is a 16-year-old flaw in FFmpeg that survived more than five million automated fuzzing tests and decades of human review.

Neither of these findings is a curiosity. Both are signals about the state of mature software that has consequences for every enterprise running anything older than this morning. The temptation when reading the headlines is to focus on the model. The more important question is what these specific discoveries reveal about the codebases all of us depend on.

This article walks through what Mythos has found, why these particular discoveries matter more than the volume number, and what enterprises sitting on years of legacy code should do about it.

A quick tour of what Mythos has discovered so far

Anthropic's published technical disclosures cover thousands of zero-day vulnerabilities discovered by Mythos Preview in the weeks since the model became operational. The full list is not public, most findings remain under coordinated disclosure with affected vendors, but the company has shared enough detail to characterize the scope and depth of the discoveries.

The vulnerabilities span every major operating system: Windows, macOS, Linux distributions, BSD variants, mobile operating systems including iOS and Android. They span every major web browser engine: Chromium, WebKit, Gecko. They include critical findings in widely deployed open-source libraries, the kind of dependencies that quietly underpin most enterprise software stacks. Several findings enable full remote code execution. Several enable privilege escalation chains that allow attackers to root smartphones or escalate from user to kernel on desktop operating systems.

The bugs are not concentrated in poorly maintained software. Many of them are in code that has been reviewed continuously for years by some of the best security engineers in the industry. Some are in code that has been the explicit target of dedicated security research programs and fuzzing infrastructure for over a decade. The fact that they were found by an AI model rather than by humans doing the same work the same way is the entire point.

Mythos also demonstrated capabilities beyond pure discovery. It can take closed-source stripped binaries, reconstruct plausible source code, and then find vulnerabilities in the reconstructed code. It can chain individual findings into multi-stage attack sequences without human steering. It can write working exploit code, not just identify the vulnerable surface. The UK AI Security Institute confirmed in independent evaluations that Mythos can complete in minutes what would take human professionals days of work.

The 27-year-old OpenBSD bug: why it matters that this one survived

OpenBSD occupies a unique position in software history. The project's stated focus is correctness and security as primary engineering goals, treated as more important than features, performance, or convenience. The codebase has been continuously audited by a small group of engineers for nearly thirty years, with security review built into the development process at every level. The project's marketing line, "Only two remote holes in the default install, in a heck of a long time", is a deliberate statement about what comprehensive security review accomplishes.

When Mythos finds a 27-year-old integer overflow in this codebase, the question is not "how did OpenBSD miss this." The question is "what does it mean about the security review limits of human-only auditing."

Integer overflow vulnerabilities are not exotic. They are one of the oldest, best-understood, and most-targeted classes of memory safety bugs in software. The pattern is well-documented. Static analyzers look for it. Fuzzers exercise it. Code review checklists include it. And yet a particular instance of this pattern, in arguably the most security-conscious mainstream codebase in the world, survived 27 years of expert review and was found in hours by a model that had never seen the code before.

The lesson is not "OpenBSD is insecure." OpenBSD is, by any reasonable measure, one of the more secure operating systems available. The lesson is that human-grade adversarial review has measurable limits, and those limits sit well below what AI-grade review can now reach. If 27-year-old bugs exist in a codebase reviewed continuously for security, what is sitting in your enterprise software, written by teams optimizing for delivery speed under deadline pressure, with security review applied retrospectively if at all?

The honest answer is: a lot. We do not know how much. We know it is more than zero. We know it is more than the publicly disclosed CVE count, which has always lagged the actual vulnerability inventory by an unknown but large multiple. The OpenBSD finding tells us the gap between disclosed vulnerabilities and actual vulnerabilities in mature code is bigger than most people in the industry have been willing to assume.

The 16-year-old FFmpeg flaw: 5 million tests could not find it

FFmpeg is the second proof point, and in some ways the more uncomfortable one.

FFmpeg is the open-source media framework that handles audio and video encoding and decoding for an enormous fraction of the internet. It is embedded in browsers, video players, streaming services, content delivery networks, and countless enterprise applications. It has been a focus of security research for years, in part because its attack surface, parsing complex, attacker-controlled binary formats, is exactly the kind of code where memory safety vulnerabilities tend to cluster.

The security research community has poured enormous effort into fuzzing FFmpeg. Google's OSS-Fuzz infrastructure runs continuous fuzzing campaigns against it. Academic papers have been written specifically on how to fuzz media libraries effectively. Distributed fuzzing infrastructure has fed FFmpeg millions of randomly generated inputs continuously for over a decade. The cumulative number of fuzz tests run against the library is in the billions.

Mythos found a vulnerability that survived all of it. The flaw had been in FFmpeg for 16 years, escaped more than five million automated tests, and was discovered by an AI model approaching the code from a different angle than the fuzzers had taken.

The lesson here is more specific than the OpenBSD case. Fuzzing is the dominant automated technique for finding memory safety vulnerabilities in C and C++ codebases. The argument for fuzzing has always been that random input exploration eventually exercises every reachable code path, and any path that can be reached can be tested for crashes. The implicit assumption is that "billions of fuzz inputs" approximates "complete coverage" for practical purposes.

The Mythos finding undermines that assumption in a way that should worry anyone responsible for enterprise software security. If the dominant automated security technique for the most-fuzzed codebase in the open-source world cannot find a 16-year-old flaw, then "we have comprehensive fuzzing coverage" is not the safety statement it has been treated as. The vulnerabilities are there. The fuzzing infrastructure was not finding them. AI models, reasoning about the code rather than randomly exploring it, can find them in hours.

This generalizes. Every enterprise that has invested in fuzzing infrastructure as a primary security control just discovered the limit of that investment. Fuzzing remains valuable. It is not, as it turns out, sufficient.

What these discoveries tell us about mature codebases

Take the two findings together and a pattern emerges that is bigger than either individual bug.

The vulnerabilities Mythos has been finding are, in many cases, in code that has had every reasonable security investment thrown at it. Continuous human review. Continuous automated testing. Bug bounty programs. Dedicated security teams. Audit-driven hardening. Years of public attention from security researchers. And the bugs were still there, undiscovered, until a model with a different approach came along.

This has three implications.

First, the disclosed-vulnerability-to-actual-vulnerability ratio in mature code is worse than the industry's working assumption. We have been operating as though the systematic security investments of the last twenty years had driven the latent vulnerability count in mature code toward something close to zero, with the remaining undiscovered bugs being exotic edge cases. The Mythos findings suggest the actual latent count is closer to "thousands per major codebase" than "tens." This changes the threat model in important ways.

Second, the categories of bugs being found are not exotic. Integer overflows. Memory corruption. Logic errors in complex parsers. These are the same bug classes that security researchers have been finding by hand for decades. The model is not finding novel bug classes the industry has not seen before. It is finding instances of well-known bug classes that human reviewers and fuzzers happened to miss. The implication: every codebase mature enough to have been audited for these bug classes is also likely to contain undiscovered instances of them.

Third, the security investment that made codebases like OpenBSD and FFmpeg better than average did its job, and the residual vulnerabilities are still substantial. Codebases with less investment, most enterprise software, are statistically much worse. The bell curve of latent vulnerabilities per codebase has its left tail at "OpenBSD-grade rigor" and gets worse from there.

Legacy code as liability: why modernization just got urgent

For most of the last twenty years, legacy code was treated as an inconvenience. Old, hard to maintain, difficult to integrate with modern infrastructure, but functional. The dominant institutional posture was "if it works, leave it alone." Modernization happened when external pressure forced it, a vendor end-of-life, a regulatory requirement, a board-level digital transformation initiative, but it was rarely treated as urgent for security reasons.

That posture no longer makes sense. Every line of code in production right now was written under the assumption that human-grade adversarial review was the worst case it would face. That assumption was wrong, and the older the code, the more wrong it gets.

Banks running COBOL workloads written in the 1980s. Hospitals running Windows Server 2012 because medical device certification cycles make upgrades expensive. Industrial control systems running firmware from 2008. Insurance carriers running mainframe applications older than most of their employees. All of this code is sitting on a population of latent zero-days that just became findable by anyone with a few thousand dollars of compute budget.

The right response is not panic. The right response is to revise the modernization calculus. Legacy code is no longer just a maintenance cost, it is an active security liability whose risk profile just shifted. Modernization investments that did not pencil out under the old threat model deserve a fresh look. Every enterprise should be asking, in this quarter's planning cycle, "which of our legacy systems would we most regret being on the Mythos-discovered CVE list six months from now," and accelerating the work on those.

This is also a useful framing for vendor conversations. Vendors that are actively investing in security review and modernization of their own codebases are about to look much better than vendors that are not. The procurement question "what is your strategy for AI-discovered vulnerabilities in your product" is going to become standard, and vendors who do not have a credible answer are going to lose deals.

What enterprises sitting on old code should do right now

Five concrete moves, in priority order:

Inventory legacy code by risk concentration. Which systems are running code more than five years old? Which of those systems handle sensitive data, are exposed to untrusted input, or are connected to networks the rest of the enterprise depends on? Build the heat map now, before you need it for a board conversation in 60 days.

Accelerate planned modernization. For systems already on a modernization roadmap, look hard at the timeline. Most modernization plans assume a multi-year horizon because the work is hard and the urgency was modest. The urgency just changed. Some of those plans need to compress.

Apply compensating controls aggressively. For legacy systems that cannot be modernized quickly, the answer is not to ignore the risk. It is to wrap the systems in layers that limit blast radius when the inevitable AI-discovered CVE arrives. Network segmentation. Identity hygiene. Application-layer firewalls. Comprehensive logging at the boundary so you can detect anomalous behavior even when you cannot patch the underlying code.

Engage vendors on their security posture. Vendors of legacy products you depend on need to know that AI-discovered vulnerabilities in their products are now your problem in a more acute way. Make the conversation explicit. Ask what they are doing about it. Use the answers in renewal negotiations.

Build the visibility layer. When the AI-discovered CVE wave hits, the question for every legacy system in your environment is going to be "did this touch us, and when." That question requires telemetry that goes back further than your current retention window. The work to build that visibility starts before the disclosure, not after.

The Mythos findings are a preview, not an outlier. The vulnerabilities Anthropic has disclosed publicly are a small subset of what the model has found, and the model itself is a small subset of what AI vulnerability discovery will look like by 2027. Every enterprise gets to choose now whether to do this work proactively or reactively. The proactive option is much cheaper.

What Claude Mythos Means for the Future of Cybersecurity. The pillar piece on the broader strategic shift.
AI Vulnerability Discovery: The New Defender Economics. Why the cost of finding bugs collapsed, and what that means.
Patch Window Collapsed: AI-Native Incident Response Now. The defender's playbook for the AI era.
How to Prepare for the AI-Discovered CVE Wave. The 90-day operational readiness plan.
Project Glasswing: The New Disclosure Architecture. How disclosure breaks at AI scale.

Inside the Zero-Days Claude Mythos Discovered

A quick tour of what Mythos has discovered so far

The 27-year-old OpenBSD bug: why it matters that this one survived

The 16-year-old FFmpeg flaw: 5 million tests could not find it

What these discoveries tell us about mature codebases

Legacy code as liability: why modernization just got urgent

What enterprises sitting on old code should do right now

Stay ahead of cyber threats

Inside the Zero-Days Claude Mythos Discovered

A quick tour of what Mythos has discovered so far

The 27-year-old OpenBSD bug: why it matters that this one survived

The 16-year-old FFmpeg flaw: 5 million tests could not find it

What these discoveries tell us about mature codebases

Legacy code as liability: why modernization just got urgent

What enterprises sitting on old code should do right now

Related reading

Stay ahead of cyber threats