One of the attractive possibilities of large language models (LLMs) is to accelerate software development by finding and closing common types of bugs. Now the technology is seeing modest success in generating fixes for well-defined classes of vulnerabilities.
Google announced that its Gemini LLM, for example, can fix 15% of detected bugs using a DAST (Dynamic Application Security Testing) technique: a small but significant increase in efficiency when dealing with thousands of vulnerabilities produced every year which often no priority is given. developers. Using information from sanitizers (tools that offer a way to find bugs at runtime), LLM detected hundreds of uninitialized values, concurrent data access violations and buffer overflows, two Google researchers said in a published paper at the end of January.
The approach could help companies eliminate some of their vulnerability backlog, says Jan Keller, Google’s technical program manager and co-author of the paper.
“Typically, fixing bugs is not something that we security engineers or software engineers are good at because, for us, it’s more interesting to code the latest functionality than to go back and fix sanitizing bugs that are lying around,” he claims.
Found by disinfectants, fixed by AI
Google’s approach focuses on resolving detected vulnerabilities using sanitizers: dynamic application security tools (DAST) that instrument an application and replace memory functions to enable error detection and reporting. Typically, developers test their code with disinfectants after they have created a working application and committed the code, but before the application is released into production. As a result, bugs detected via sanitizer are considered less critical and slower to fix because they don’t block the release of the software, according to Google.
When Google researchers decided to try the experiment, they had no idea whether it would work and were happy with the initial single-digit success, Google’s Keller says.
The main advantage of Google’s approach is that artificial intelligence (AI) not only suggested patches, but researchers were able to test patch candidates using automation, according to Eitan Worcel, CEO and co- founder of Mobb, a startup focused on automating code fixes for developers. Otherwise, the problem would simply shift from finding vulnerabilities in huge amounts of code to testing huge numbers of patches.
“If you have 10, 50, 10,000 [potential vulnerabilities] – which often happens with static analysis scans – you will have hundreds or thousands of results. You won’t be able to use it, right?” he says. “If it’s not a tool that has great accuracy and doesn’t give you a result that you can rely on, and just produces a large amount of potential vulnerabilities to fix, I don’t see any that you use it. They’ll just go back to not fixing.”
Google, however, had an automated way to test each patch. The researchers compiled the software with the patched code; if it had continued to work, the researchers would have considered it to have passed the test.
“We then have an automated build environment where we can test whether the produced fix actually fixes the bug,” says Google’s Keller. “So a lot of the bad patches are filtered out, because we will see that the patched software no longer works and we will know that the patch is bad, or that the vulnerability is still there, or that the bug is still there.”
Automated bug-fixing systems will be increasingly needed as AI tools help developers produce more code and, likely, more vulnerabilities. As companies increasingly use machine learning (ML) models to find bugs, the list of issues that need to be evaluated and fixed will grow, Google notes.
Technology in progress for a decade
Using AI/ML models to repair software and create patches is nothing new. In 2015, the nonprofit engineering firm Draper created a system called DeepCode which analyzed huge volumes of software code to find errors and suggest fixes. The software security company Veracode created its own system, nicknamed Correctionusing a rigorously curated dataset of reference patches for specific classes of vulnerabilities in specific languages to provide suggested fixes for any vulnerabilities its system discovers in a customer’s code base.
Whether it is a more general method, like Google searchIt remains to be seen whether a more personalized approach works better, says Chris Eng, head of research for Veracode.
“When you throw a general-purpose AI at a problem and simply expect it to solve any open-ended question you ask, obviously you’re going to get mixed results,” he says. “I think the more you can focus on a particular type of problem that you’re asking to solve, the better success you’ll have. And experimentally, that’s what we’ve seen as well.” .”
Bringing AI patches to IT operations
Applications of AI/ML models promise not only to create fixes for vulnerabilities discovered during development, but to help create and patch systems as part of IT operations.
For businesses, the way to fix vulnerable software is to patch the application. The greatest risk in applying patches comes from the adverse side effects that occur when a change in software breaks a production system. LLMs’ ability to sift through data could help ensure this doesn’t happen, says Eran Livne, senior director of endpoint remediation product management at vulnerability management firm Qualys.
“The biggest concern we see from all customers is that they know they have a vulnerability, but they’re worried that if they patch it, something else will break,” he says. “Not, ‘How do we patch this?’ but, ‘What happens if I patch it?'”
Intelligent automation can help developers fix software more efficiently and help companies apply fixes more quickly. In both cases, applying AI to check and fix as many bugs – and systems – as possible can help reduce the backlog of existing vulnerabilities.