AI Hacking Tools Still Need Human Handlers to Work

AI Hacking Tools Still Need Human Handlers to Work

Anthropic and OpenAI have released powerful AI models capable of finding tens of thousands of software bugs across operating systems, but early testing reveals a critical limitation: they don't work well without experienced humans in the loop.

Security firms testing Mythos and GPT-5.5-Cyber this week discovered the models excel at identifying vulnerabilities and chaining low-severity flaws into functional attack sequences. Palo Alto Networks found 75 bugs using both systems, compared to its typical monthly haul of 5 to 10. Microsoft's new AI-powered security agent uncovered 16 new weaknesses in Windows networking and authentication. Yet all the major testers reached the same uncomfortable conclusion: raw AI output is unreliable without skilled security researchers validating, filtering and guiding the process.

The problem is both technical and practical. XBOW, an AI-powered penetration testing startup, found that Mythos struggled to validate whether its discovered exploits actually work, sometimes being overly cautious or literal in its assessments. Palo Alto Networks saw a false positive rate around 30% across its products. Daniel Stenberg, lead developer of the open-source Curl project, reported that Mythos flagged one genuine low-severity bug alongside several false alarms and issues the team deemed insignificant.

Cisco's newly released "Foundry Security Spec" blueprint highlights the core challenge. The company found that frontier AI models produce "fluent, confident, plausible vulnerability claims that are wrong at a rate that makes unreviewed output worthless." Instead of asking systems to be more careful, Cisco researchers got better results by instructing AI to make claims verifiable and then explicitly check its own work, a technique enterprises are beginning to adopt.

Albert Ziegler, head of AI at XBOW, summed up the dynamic plainly: "A model is a brain without a body." The brain requires a human operator whose expertise and control can match its power, he added.

The human dependency cuts both ways. While organizations will have trained security staff to manage false positives and validate findings, the same models could give nation-state hackers and criminal groups more sophisticated attack capabilities. Palo Alto Networks chief product officer Lee Klarich noted that adversaries won't face the same learning curve; attackers already understand how to exploit software and will refine their use of these tools faster. Research from the U.K. AI Security Institute published this week showed that additional computing power and inference-time scaling alone can substantially boost autonomous cyber capabilities without waiting for new model releases.

Author James Rodriguez: "These models are powerful amplifiers, not replacements, and that asymmetry between offense and defense isn't going away anytime soon."

Comments