AI models are getting better at replacing cybersecurity pros on certain tasks : Global Viral News24

AI +ML

Binary digits arranged into a bug shape on a black background.

Elena Abrazhevich / Shutterstock

UK researchers find LLMs are learning to finish jobs faster and improving all the time

The UK AI Security Institute (AISI) has found that frontier models are quickly becoming more efficient when asked to do some cybersecurity work.

AISI measures this with its “time window benchmark for cybersecurity,” which estimates how much work an AI can do compared to a human. Using the benchmark could lead to findings such as Claude Sonnet 4.5 can do what a human cybersecurity expert can do in 16 minutes about 80 percent of the time, given a budget of 2.5m tokens.

AISI has found the human-comparable task time – 16 minutes in this instance – is growing, fast. If tokens flowed freely instead of being arbitrarily capped, AI models might do better still.

In February 2026, AISI internally reduced the expected task time doubling period from 8 to 4.7 months, based on progress made since late 2024.

With the release of Anthropic Mythos Preview and OpenAI GPT-5.5, AISI has once again had to compress its projected doubling period.

“In February 2026, we estimated that frontier models’ 80 percent-reliability cyber time horizon had doubled every 4.7 months since reasoning models emerged in late 2024, given a 2.5M token limit,” the AISI said in a post on Wednesday.

“This was around half our November 2025 doubling time estimate, which was 8 months for both 50 percent and 80 percent reliability. Claude Mythos Preview and GPT-5.5 have since significantly outperformed this trend.”

The recalculated doubling time estimate, given what Mythos Preview and GPT-5.5 can do, is even shorter than 4.7 months. AISA does not cite a specific value but the organization points to similar time horizon estimates based on measurements of a broader skillset, software engineering, made by non-profit AI research house METR.

“Their results imply a consistent doubling time of 4.2 months on software tasks since late 2024,” AISI said, noting that with the latest Mythos Preview checkpoint (model update), it’s closer to 4 months.

Note that the time window benchmark is not a broad assessment of capabilities – AISI is not saying frontier models are becoming twice as capable by all measures. It’s a narrow assessment based on the time it takes people to accomplish security tasks.

Citing a different metric, AISI says the latest Mythos Preview checkpoint solved a 32-step simulated corporate network attack called “The Last Ones” in six of 10 attempts and managed to complete a previously unsolved challenge, a seven-step industrial control system attack called “Cooling Tower,” in three of 10 attempts.

As a point of comparison, when Opus 4.6 was evaluated in February 2026, it completed a maximum of 22 of 32 steps for The Last Ones. That model managed to reach milestone 6, which involves reverse-engineering a Windows service binary to access encrypted credentials, escalating privileges via token impersonation, and recovering a cryptographic key to access a command-and-control management service.

“Frontier AI’s autonomous cyber and software capability is advancing quickly: the length of cyber tasks that frontier models can complete autonomously has doubled on the order of months, not years,” AISI concludes. “What this evidence does not tell us is how the pace of progress will evolve, when AI will reach any particular capability threshold, or how these capabilities will translate against defended, real-world systems.”

The curl project offers one data point with regard to the real world implications of the latest frontier models: Mythos managed to find just one confirmed vulnerability in its codebase.

But watch this space. ®

Source link

Pomp and pageantry: Key moments from day one of Trump's visit to China

Bitcoin May Undo Rally After Hitting Resistance: CryptoQuant

Massive Russian strikes across Ukraine leave one dead, officials say

Bonds Have Much More to Selloff In 2026: 3-Minutes MLIV

‘Yellowstone’ Spinoff Sends Beth, Rip to Texas

Bitcoin Firm Nakamoto Records Q1 Net Loss Despite Revenue Boom

Xi tells US executives China will ‘open wider’ for business

Will F1 go back to the future with V8 engines?

Russia pummels Ukraine with missiles and drones as ceasefire collapses

To gain root access at this company, all an intruder had to do was ask nicely

AI models are getting better at replacing cybersecurity pros on certain tasks

Leave a Reply Cancel reply

Leave a Reply Cancel reply

Related News