ChatGPT 4 can exploit 87% of one-day vulnerabilities: Really that impressive?

After reading about the recent cybersecurity research by Richard Fang, Rohan Bindu, Akul Gupta and Daniel Kang, I had questions. While initially impressed that ChatGPT 4 can exploit the vast majority of one-day vulnerabilities, I started thinking about what the results really mean in the grand scheme of cybersecurity. Most importantly, I wondered how a human cybersecurity professional’s results for the same tasks would compare.

To get some answers, I talked with Shanchieh Yang, Director of Research at the Rochester Institute of Technology’s Global Cybersecurity Institute. He had actually pondered the same questions I did after reading the research.

What are your thoughts on the research study?

Yang: I think that the 87% may be an overstatement, and I think it would be very helpful to the community if the authors shared more details about their experiments and code, as they’d be very helpful for the community to look at it. I look at large language models (LLMs) as a co-pilot for hacking because you have to give them some human instruction, provide some options and ask for user feedback. In my opinion, an LLM is more of an educational training tool instead of asking LRM to hack automatically. I also wondered if the study referred to anonymous, meaning with no human intervention at all.

Compared to even six months ago, LLMs are pretty powerful in providing guidance on how a human can exploit a vulnerability, such as recommending tools, giving commands and even a step-by-step process. They are reasonably accurate but not necessarily 100% of the time. In this study, one-day refers to what could be a pretty big bucket to a vulnerability that’s very similar to past vulnerabilities or totally new malware where the source code is not similar to anything the hackers have seen before. In that case, there isn’t much an LLM can do against the vulnerability because it requires human understanding in trying to break into something new.

The results also depend on whether the vulnerability is a web service, SQL server, print server or router. There are so many different computing vulnerabilities out there. In my opinion, claiming 87% is an overstatement because it also depends on how many times the authors tried. If I’m reviewing this as a paper, I would reject the claim because there is too much generalization.

If you timed a group cybersecurity professional to an LLM agent head-to-head against a target with unknown but existing vulnerabilities, such as a newly released Hack the Box or Try Me Hack, who would complete the hack the fastest?

The experts — the people who are actually world-class hackers, ethical hackers, white hackers — they would beat the LLMs. They have a lot of tools under their belts. They have seen this before. And they are pretty quick. The problem is that an LLM is a machine, meaning that even the most state-of-the-art models will not give you the comments unless you break the guardrail. With an LLM, the results really depend on the prompts that were used. Because the researchers didn’t share the code, we don’t know what was actually used.

Any other thoughts on the research?

Yang: I would like the community to understand that responsible dissemination is very important — reporting something not just to get people to cite you or to talk about your stuff, but be responsible. Sharing the experiment, sharing the code, but also sharing what could be done.

Jennifer Gregory

Cybersecurity Writer

Are successful deepfake scams more common than we realize?

4 min read - Many times a day worldwide, a boss asks one of their team members to perform a task during a video call. But is the person assigning tasks actually who they say they are? Or is it a deepfake? Instead of blindly following orders, employees must now ask themselves if they are becoming a victims of fraud.Earlier this year, a finance worker found themselves talking on a video meeting with someone who looked and sounded just like their CFO. After the…

How to calculate your AI-powered cybersecurity’s ROI

4 min read - Imagine this scenario: A sophisticated, malicious phishing campaign targets a large financial institution. The attackers use emails generated by artificial intelligence (AI) that closely mimic the company's internal communications. The emails contain malicious links designed to steal employee credentials, which the attackers could use to gain access to company assets and data for unknown purposes.The organization's AI-powered cybersecurity solution, which continuously monitors network traffic and user behavior, detects several anomalies associated with the attack, blocks access to the suspicious domains…

ISC2 Cybersecurity Workforce Study: Shortage of AI skilled workers

4 min read - AI has made an impact everywhere else across the tech world, so it should surprise no one that the 2024 ISC2 Cybersecurity Workforce Study saw artificial intelligence (AI) jump into the top five list of security skills.It’s not just the need for workers with security-related AI skills. The Workforce Study also takes a deep dive into how the 16,000 respondents think AI will impact cybersecurity and job roles overall, from changing skills approaches to creating generative AI (gen AI) strategies.Budgets…

Security Intelligence

{{title}}

{{title}}

{{title}}

Topics

What are your thoughts on the research study?

Any other thoughts on the research?

More from Artificial Intelligence

Are successful deepfake scams more common than we realize?

How to calculate your AI-powered cybersecurity’s ROI

ISC2 Cybersecurity Workforce Study: Shortage of AI skilled workers

Topic updates