Skip to content
AttackFeed by Joe Wagner | Cybersecurity News from Across the Internet

AttackFeed by Joe Wagner

Cybersecurity News from Across the Internet

  • Attack/News Feeds
  • Gov Alerts/ISAC Feeds
  • Vulnerability Alerts
  • Privacy/Governance Feeds
  • Fraud Feeds
  • iOS App
  • Android App
  • Home
  • Attack Feeds
  • Researchers say AI just broke every benchmark for autonomous cyber capability  – CyberScoop
AttackFeed by Joe Wagner | Researchers say AI just broke every benchmark for autonomous cyber capability  - CyberScoop

Researchers say AI just broke every benchmark for autonomous cyber capability  – CyberScoop

Posted on May 13, 2026 By Greg Otto No Comments on Researchers say AI just broke every benchmark for autonomous cyber capability  – CyberScoop
Attack Feeds

Two of the most advanced artificial intelligence models — Anthropic’s Claude Mythos Preview and OpenAI’s GPT-5.5 — have significantly surpassed the already-accelerating pace at which AI systems are completing autonomous cybersecurity tasks, according to separate findings published Wednesday by the United Kingdom’s AI Security Institute (AISI) and Palo Alto Networks.

The AISI, which conducts pre-deployment evaluations of frontier AI models on behalf of the British government, said both Claude Mythos Preview and GPT-5.5 have substantially exceeded the doubling trend the institute had been tracking since late 2024. Whether the results represent an isolated capability jump or the start of a new, faster trajectory remains unclear.

The AISI estimated earlier this year that frontier models’ 80% reliability cyber time horizon — a measure of how long a task takes a human expert, used as a proxy for AI autonomy — had been doubling approximately every five months. That was itself roughly half the eight-month doubling time the institute estimated in November 2025. Now Mythos Preview and GPT-5.5 have since outperformed any trend lines the institute has measured.

“Frontier AI’s autonomous cyber and software capability is advancing quickly: the length of cyber tasks that frontier models can complete autonomously has doubled on the order of months, not years,” the AISI wrote.

The clearest evidence of the capability jump came from the AISI’s cyber ranges, its structured simulations of multi-stage attacks against small, undefended enterprise networks. A newer checkpoint of Claude Mythos Preview became the first model to complete both of the institute’s ranges. It solved “The Last Ones,” a 32-step simulated corporate network attack, in 6 of 10 attempts, and completed “Cooling Tower” — previously unsolved by any model — in 3 of 10 attempts. GPT-5.5 solved “The Last Ones” in 3 of 10 attempts.

Palo Alto Networks reached similar conclusions through its own testing. The company said it began testing Claude Mythos in April as a launch partner for Anthropic’s Project Glasswing, and has since tested Claude Opus 4.7 and OpenAI’s GPT-5.5-Cyber as part of OpenAI‘s Trusted Access for Cyber program.

“The latest models are extraordinarily capable at finding vulnerabilities and changing them into critical exploit paths in near-real-time,” Palo Alto Networks wrote.

The company released security advisories covering 26 CVEs representing 75 issues — compared to a typical monthly volume of fewer than five CVEs — that were identified through AI model scanning across more than 130 products. All important vulnerabilities in its SaaS products had been patched, with patches available for all customer-operated products.

The AISI was careful to note the limits of its data. The estimates are based on a relatively small number of models, and the hardest tasks in the test suite have the least amount of human comparison data. Even so, the institute said the overall trend holds up: dropping any single model from the analysis barely moves the needle, shifting the estimated doubling time by less than a month in either direction. Separate research from METR, a nonprofit that tracks how quickly AI handles software tasks, arrived at a nearly identical figure — a doubling time of approximately four months since late 2024.

“No single benchmark result should be read as a precise measure of AI capability,” the AISI wrote. “Regardless, the direction of change and rapid growth have been consistent across the models, methodological choices and independent data we examined.”

Palo Alto Networks outlined four immediate priorities for enterprises as these models continue to grow in usage: First, find and fix vulnerabilities in code and applications before attackers do. Second, shrink the attack surface and use AI to spot security misconfigurations. Third, deploy detection and response tools across all systems, using machine learning to catch threats in real time. Fourth, build security operations fast enough to respond in minutes, because AI-powered attacks may soon unfold that quickly.

The AISI said it is developing more demanding evaluations, including new cyber ranges and the addition of active cyber defenses, to better reflect real-world conditions as model capabilities continue to advance.

The post Researchers say AI just broke every benchmark for autonomous cyber capability appeared first on CyberScoop.

  –

Read More  – CyberScoop 

Post navigation

❮ Previous Post: DOJ releases legal rationale for nationwide voter data collection  – CyberScoop
Next Post: Closed briefing sets stage for House hearing on Anthropic’s Mythos and cyber risks  – CyberScoop ❯

You may also like

AttackFeed by Joe Wagner | Your AI Agents Are Already Inside the Perimeter. Do You Know What They're Doing?  - The Hacker News
Attack Feeds
Your AI Agents Are Already Inside the Perimeter. Do You Know What They’re Doing?  – The Hacker News
May 6, 2026
AttackFeed by Joe Wagner | GraphAlgo Scam: Lazarus Hackers Register Real US LLCs to Spread Malware  - Hackread – Cybersecurity News, Data Breaches, AI and More
Attack Feeds
GraphAlgo Scam: Lazarus Hackers Register Real US LLCs to Spread Malware  – Hackread – Cybersecurity News, Data Breaches, AI and More
April 10, 2026
AttackFeed by Joe Wagner | Pwn2Own Berlin 2026 - Day Two Results  - Zero Day Initiative - Blog
Attack Feeds
Pwn2Own Berlin 2026 – Day Two Results  – Zero Day Initiative – Blog
May 15, 2026
AttackFeed by Joe Wagner | After major Poland energy grid cyberattack, CISA issues warning to U.S. audience  - CyberScoop
Attack Feeds
After major Poland energy grid cyberattack, CISA issues warning to U.S. audience  – CyberScoop
February 10, 2026

Leave a Reply Cancel reply

You must be logged in to post a comment.

  • Attack Feeds
  • Privacy/Governance Feed
  • Gov/ISAC Feeds
  • Alert Feeds
  • Privacy Policy
  • Wagner Cybersecurity

Copyright © 2026 AttackFeed by Joe Wagner.

Theme: Oceanly News Dark by ScriptsTown

We are using cookies for analytics purposes only.  We do not store, track or sell user information.

You can find out more about which cookies we are using or switch them off in .

AttackFeed by Joe Wagner
Powered by  GDPR Cookie Compliance
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.