AI agents show they can create exploits, not just find vulns : Global Viral News24

AI + ML

Mythos and GPT-5.5 muscle out the competition

Sure, AI agents such as Mythos can find security vulnerabilities in software, but the bigger question is whether they can turn those flaws into functional exploits that work in the real world. After all, many AI-discovered bugs prove minor or difficult to weaponize. New research, however, suggests frontier models can indeed develop working exploits when directed to do so.

To better understand the rapidly changing security landscape, computer scientists from UC Berkeley, Max Planck Institute for Security and Privacy, UC Santa Barbara, Arizona State University, Anthropic, OpenAI, and Google decided to build ExploitGym, a benchmark for evaluating the exploitation capabilities of AI agents.

This is not an entirely disinterested set of investigators – Anthropic, OpenAI, and Google all sell AI services. And both Anthropic and OpenAI have talked up the risk of leading models Claude Mythos Preview and GPT-5.5 while selling access to government partners.

Since Anthropic announced Mythos in early April, the security community has been critical of the company’s approach, described by some as fear-mongering. And various security experts have made the case that even commercially available AI models can find security flaws.

Nonetheless, Mythos and GPT-5.5 outshine their peers in ExploitGym, as described in the paper, “ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?”

ExploitGym consists of 898 real vulnerabilities found in applications, Google’s V8 JavaScript engine, and the Linux kernel. Its workout consists of presenting an AI agent with a vulnerability and proof-of-concept input that triggers it, to see whether the agent can create an exploit capable of arbitrary code execution.

According to the UC Berkeley Center for Responsible Decentralized Intelligence, Mythos Preview successfully exploited 157 test instances and GPT-5.5 managed 120 in the allotted two-hour window.

“Even when standard security defenses like ASLR or the V8 sandbox were turned on, a meaningful number of exploits still worked,” the boffins wrote in a blog post. “More strikingly, agents sometimes discovered and exploited entirely different vulnerabilities than the ones they were pointed at.”

The agents (CLI + model) tested were Claude Code with Claude Opus 4.6, Claude Opus 4.7, Claude Mythos Preview, and GLM-5.1; Codex CLI with GPT-5.4/GPT-5.5; and Gemini CLI with Gemini 3.1 Pro. And even the ancient models released in February (Opus 4.6 and Gemini 3.1 Pro) had some success.

Model	Agent	Total	U	B	K	Cost (USD)		Time (min)
Model	Agent	Total	U	B	K	Succ.	Full	Succ.	Full
Claude Mythos Preview †	Claude Code	157	107	38	12	–	–	54.7	102.1
Claude Opus 4.6 †	Claude Code	15	12	2	1	8.08	21.76	18.1	66.7
Claude Opus 4.7	Claude Code	7	4	3	0	8.64	3.40	22.1	14.4
Gemini 3.1 Pro	Gemini CLI	12	10	2	0	8.56	9.02	51.1	75.6
GLM-5.1	Claude Code	4	4	0	0	3.75	6.39	63.3	118.0
GPT-5.4	Codex CLI	54	38	15	1	12.20	25.43	51.1	103.5
GPT-5.5 ‡	Codex CLI	120	71	27	22	22.99	34.55	49.6	69.8

U = Userspace · B = Browser V8 · K = Kernel ·
Succ. = successful runs · Full = full benchmark ·
† preview model · ‡ see notes

The researchers say that one of their more interesting findings is that these models sometimes went “off-script” in capture-the-flag (CTF) environments, where an agent has to find and retrieve some hidden value.

This was most evident with Mythos Preview and GPT-5.5. The former succeeded in 226 CTF exercises but only used the intended bug in 157 instances, while the latter captured 210 flags and only used the intended bug in 120 of those cases.

The authors also note that while there was some overlap in the exploits discovered, the various models found different exploits. This suggests applying a diverse set of models might be advantageous both in attack and defense scenarios.

It’s worth adding that ExploitGym tests were done with security guardrails disabled. When the test was re-run on GPT-5.5 with default safety filters active, the model refused 88.2 percent of the time before making any tool call.

The Register, however, has seen security researchers craft prompts in a way to avoid triggering refusals. So safeguards of that sort have limits.

“Our results show that autonomous exploit development by frontier AI agents is no longer a hypothetical capability,” the authors state in their paper. “While current agents are not yet reliable across all targets, they already exploit a non-trivial fraction of real-world vulnerabilities, including complex targets such as kernel components.” ®

Source link

How China critics in Maga movement reacted to Trump's Beijing trip

A cold peace between the US and China is good enough

How Rayner, Streeting and Burnham weakened PM in 12 hours of political drama

URC: Cardiff 22-16 Stormers – Welsh side book play-off place with bonus-point win

Alleged murder of Aboriginal girl highlights Australia's deep inequalities

LocalSend puts your sneakernet out of business

Berkshire Boosts Alphabet, Exits Amazon in CEO’s First Quarter

‘Chad Powers’ Star Glen Powell’s Busy Schedule, Says He’s Not Burnt Out

Lombard Finance Dumps LayerZero, Will Use Chainlink to Power $1 Billion in Bitcoin Assets

With Possible Raúl Castro Indictment, U.S. Eyes Venezuela Playbook

AI agents show they can create exploits, not just find vulns

Leave a Reply Cancel reply

Model benchmark comparison table showing agent success rates by category (userspace, browser V8, kernel), costs, and time across different AI models.

Leave a Reply Cancel reply

Related News