Google Researchers Claim First Vulnerability Found Using AI

Summary: Researchers from Google Project Zero and Google DeepMind have identified their first real-world vulnerability using a large language model (LLM), specifically an exploitable stack buffer underflow in SQLite. The flaw was reported and fixed immediately, highlighting the potential of AI-assisted vulnerability research while also exposing limitations in current fuzzing methods.

Threat Actor: Google Project Zero and Google DeepMind | Google Project Zero
Victim: SQLite | SQLite

Key Point :

  • The vulnerability was discovered through the Big Sleep project, which utilizes AI to assist in vulnerability research.
  • Current fuzzing methods failed to detect the vulnerability due to specific configurations and code versions not being tested.
  • AI-assisted methods may help in identifying and analyzing vulnerabilities more effectively in the future.
  • Previous claims of LLM-assisted vulnerability discoveries exist, indicating a growing trend in AI’s role in cybersecurity.

Researchers from Google Project Zero and Google DeepMind have found their first real-world vulnerability using a large language model (LLM).

In a November 1 blog post, Google Project Zero researchers said the vulnerability is an exploitable stack buffer underflow in SQLite, a widely used open-source database engine.

A team from Project Zero and DeepMind, working under the Big Sleep project, found the flaw in early October before it appeared in an official release. They immediately reported it to the developers, who fixed it the same day. SQLite users were not impacted.

“The vulnerability is quite interesting, along with the fact that the existing testing infrastructure for SQLite (both through OSS-Fuzz and the project’s own infrastructure) did not find the issue, so we did some further investigation,” the Big Sleep researchers wrote.

From Naptime Framework to Big Sleep Project

The hybrid team’s AI-powered vulnerability research builds on the work started in 2023 within Project Zero to develop Naptime, a framework enabling an LLM to assist vulnerability researchers.

The framework’s architecture is centered around the interaction between an AI agent and its set of specialized tools designed to mimic the workflow of a human security researcher and a target codebase.

Infosecurity reported on Naptime in June 2024.

Filling the Fuzzing Failures Gap

While the Big Sleep researchers highlighted that the project is still in the early stages and they only have highly experimental results, they also believe it has “tremendous defensive potential.”

Currently, the most common way developers test the software before they go into production is fuzzing.

Also known as fuzz testing, fuzzing involves providing invalid, unexpected or random data as inputs to a computer program or software. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks.

However, this method seems to have failed to detect the SQLite vulnerability this time. This is due to several complex factors. The bottom line is that the fuzzing setups – the automated testing tools – lacked the specific configurations and code versions needed to trigger the issue.

Another common issue is that unknown vulnerabilities, also known as zero-days, are often variants of known and fixed vulnerabilities.

“As this trend continues, it’s clear that fuzzing is not succeeding at catching such variants, and that for attackers, manual variant analysis is a cost-effective approach,” the Big Sleep researchers wrote.

“By providing a starting point – such as the details of a previously fixed vulnerability – we remove a lot of ambiguity from vulnerability research, and start from a concrete, well-founded theory: ‘This was a previous bug; there is probably another similar one somewhere.’”

While the researchers conceded that overall, fuzzing will continue to be as – or more – effective as LLM-assisted manual vulnerability analysis, they hope “AI can narrow this gap.”

“We hope that in the future this effort will lead to a significant advantage to defenders – with the potential not only to find crashing test cases, but also to provide high-quality root-cause analysis, triaging and fixing issues could be much cheaper and more effective in the future.” 

At this time, the Big Sleep researchers only use small programs with known vulnerabilities to evaluate the progress of their method.

Previous Records of Successful LLM-Assisted Vulnerability Research

While the researchers claimed that this is the first public example of an AI agent finding a previously unknown exploitable memory-safety issue in widely used real-world software, AIfredo Ortega, a security researcher at Neuroengine, said on X  he managed to discover a zero-day in OpenBSD using LLMs back in April 2024 – and he published his result in June.

He also mentioned  the work of Google’s Open Source Security Team, who found an out-of-bound read in OpenSSL in October.

“I think it’s just an honest mistake, quite common in academic circles. Academics are usually not super aware of what happens outside their circle. They cannot know everything that is published in the field. But they just needed to Google it,” he told Infosecurity.

Read now: How to Disclose, Report and Patch a Software Vulnerability

Source: https://www.infosecurity-magazine.com/news/google-first-vulnerability-found