Imagine a world where the software that powers your favorite apps, protects your online transactions, and maintains your digital life could be outsmarted and taken over by a cleverly disguised piece of code. This is not the plot of the latest cyber-thriller; it’s actually been a reality for years now. How this will change – positively or negatively – as artificial intelligence (AI) takes on an increasingly important role in software development is one of the great uncertainties of this brave new world.
In an era where artificial intelligence promises to revolutionize the way we live and work, the debate over its security implications cannot be set aside. As we increasingly rely on AI for tasks ranging from the mundane to the crucial, the question is no longer just: “Can AI enhance cybersecurity?” (of course!), but also “Artificial intelligence can be hacked?” (yes!), “Artificial intelligence can be used hack?” (of course!) and “Will AI produce secure software?” (well…). This thought leadership article is about the latter. Cydrill (a secure coding training company) delves into the complex landscape of vulnerabilities produced by AI, with a special focus on GitHub Copilot, to highlight the imperative of secure coding practices to safeguard our digital future.
You can test your secure coding skills with this short self evaluation.
The safety paradox of artificial intelligence
AI’s leap from academic curiosity to cornerstone of modern innovation happened rather suddenly. His applications span a wide range of fields, offering solutions that were once the stuff of science fiction. However, this rapid progress and adoption has outpaced the development of corresponding security measures, leaving both AI systems and AI-created systems vulnerable to a variety of sophisticated attacks. Already seen? The same things happened when software – as such – was conquering many fields of our lives…
At the heart of many AI systems is machine learning, a technology that relies on large data sets to “learn” and make decisions. Ironically, AI’s strength – its ability to process and generalize large amounts of data – is also its Achilles’ heel. The starting point of “everything we find on the Internet” may not be the perfect training data; unfortunately, the wisdom of masses it may not be enough in this case. Additionally, hackers, armed with the right tools and knowledge, can manipulate this data to cause the AI to make bad decisions or take malicious actions.
Co-pilot in the crosshairs
GitHub Copilot, based on OpenAI’s Codex, demonstrates the potential of artificial intelligence in coding. It is designed to improve productivity by suggesting code snippets and even entire blocks of code. However, numerous studies have highlighted the dangers of relying completely on this technology. It has been demonstrated that a significant portion of the code generated by Copilot may contain security flaws, including vulnerabilities to common attacks such as SQL injection and buffer overflow.
The “Garbage In, Garbage Out” (GIGO) principle is particularly relevant in this case. AI models, including Copilot, are trained on existing data, and just like any other large language model, most of this training is unsupervised. If this training data is flawed (which is very likely given that it comes from open source projects or large Q&A sites like Stack Overflow), the output, including code suggestions, could inherit and propagate these flaws. In the early days of Copilot, a study revealed that approximately 40% of the code samples Copilot produced when asked to complete code based on samples from the CWE Top 25 were vulnerable, underscoring the GIGO principle and the need for greater awareness of safety. A larger-scale study in 2023 (Is GitHub’s co-pilot as bad as humans at injecting vulnerabilities into code?) performed slightly better, but still far from good: Removing the vulnerable line of code from the real-world vulnerability examples and asking Copilot to complete it, it recreated the vulnerability about 1/3 of the time and fixed the vulnerability only about 1 time/4 of the time. Furthermore, it performed very poorly on vulnerabilities related to failure to validate input, producing vulnerable code every time. This highlights that generative AI is poorly equipped to handle malicious input if no “silver bullet” solutions are available to address a vulnerability (e.g. prepared statements).
The road to secure AI-powered software development
Addressing the security challenges posed by artificial intelligence and tools like Copilot requires a multifaceted approach:
- Understand vulnerabilities: It is essential to recognize that AI-generated code can be susceptible to the same types of attacks as “traditionally” developed software.
- Elevate secure coding practices: Developers need to be trained in safe coding practices, taking into account the nuances of AI-generated code. This involves not only identifying potential vulnerabilities, but also understanding the mechanisms through which AI suggests certain code fragments, to effectively anticipate and mitigate risks.
- SDLC adaptation: It’s not just technology. Processes should also take into account the subtle changes that AI will bring. When it comes to Copilot, code development is usually the focus. But requirements, design, maintenance, testing, and operations can also benefit from large language models.
- Continuous supervision and improvement: Artificial intelligence systems, as well as the tools they power, are constantly evolving. Keeping pace with this evolution means staying informed about the latest security research, understanding emerging vulnerabilities, and updating existing security practices accordingly.
Navigating the integration of AI tools like GitHub Copilot into your software development process is risky and requires not only a shift in mindset but also the adoption of solid strategies and technical solutions to mitigate potential vulnerabilities. Here are some practical tips designed to help developers ensure that using Copilot and similar AI-powered tools improves productivity without compromising security.
Implement rigorous input validation!
Practical implementation: Defensive programming is always at the heart of secure coding. When accepting code suggestions from Copilot, especially for functions that handle user input, implement strict input validation measures. Define rules for user input, create an allowlist of allowed characters and data formats, and ensure inputs are validated before processing. You can also ask Copilot to do it for you; Sometimes it works really well!
Manage dependencies safely!
Practical implementation: Copilot may suggest adding dependencies to your project, and attackers may use this to implement supply chain attacks via “packet hallucinations.” Before incorporating the suggested libraries, manually verify their security status by checking known vulnerabilities in databases like the National Vulnerability Database (NVD) or perform a software composition analysis (SCA) with tools like OWASP Dependency-Check or npm audit for Node.js projects. These tools can automatically monitor and manage the security of dependencies.
Conduct periodic security assessments!
Practical implementation: Regardless of the source of the code, be it AI-generated or hand-crafted, perform regular code reviews and tests with an eye on security. Combine approaches. Test statically (SAST) and dynamically (DAST), perform software composition analysis (SCA). Run manual tests and integrate them with automation. But remember to put people above tools: no tool or artificial intelligence can replace natural (human) intelligence.
Be gradual!
Practical implementation: First, let Copilot write your comments or debug logs – it’s already pretty good at these. Any errors in these will still not affect the security of your code. Then, once you get familiar with how it works, you can gradually let it generate more and more code snippets for the actual functionality.
Always check what Copilot offers!
Practical implementation: Never blindly accept what Copilot suggests. Remember, you are the pilot, it’s “just” the Co-pilot! You and Copilot can make a very effective team together, but you’re still in charge, so you need to know what the expected code is and what the outcome should look like.
To experiment!
Practical implementation: Try different things and suggestions (in chat mode). Try asking Copilot to refine the code if you’re not satisfied with what you’ve achieved. Try to understand how Copilot “thinks” in certain situations and realize his strengths and weaknesses. Plus, Copilot gets better over time, so experiment all the time!
Stay informed and educated!
Practical implementation: Continuously educate yourself and your team on the latest security threats and best practices. Follow security blogs, attend webinars and workshops, and participate in secure coding forums. Knowledge is a powerful tool for identifying and mitigating potential vulnerabilities in code, whether AI-generated or not.
Conclusion
The importance of secure coding practices has never been more important as we navigate the uncharted waters of AI-generated code. Tools like GitHub Copilot present significant opportunities for growth and improvement, but also unique challenges when it comes to code security. Only by understanding these risks can we successfully balance effectiveness with security and keep our infrastructure and data protected. On this journey, Cydrill remains committed to providing developers with the knowledge and tools needed to build a more secure digital future.
Cydrill’s blended learning path provides proactive and effective secure coding training for developers at Fortune 500 companies around the world. Combining instructor-led training, e-learning, hands-on labs, and gamification, Cydrill provides a new and effective approach to learning to code safely.
Check out Cydrill’s secure coding courses.