A worm that uses intelligently timely engineering and injection is capable of tricking generative artificial intelligence (GenAI) apps like ChatGPT into propagating malware and more.
In a lab setting, three Israeli researchers demonstrated how an attacker could design “self-replicating adversary prompts” that convince a generative model to replicate input as output: If a malicious prompt arrives, the model will turn around and reject it , allowing it to spread to other AI agents. Prompts can be used to steal information, spread spam, poison models, and more.
They called it “Morris II” after the infamous 99-line self-propagating malware that destroyed a tenth of the entire Internet in 1988.
AI app “ComPromptMized”.
To demonstrate how self-replicating AI malware could work, researchers created an email system that can receive and send emails using generative artificial intelligence.
Next, as a red team, they wrote a timely email that leverages Retrieval Augmented Generation (RAG) – a method used by AI models to retrieve trusted external data – to contaminate the receiving email assistant’s database . When the email is retrieved from the RAG and sent to the gen AI model, jailbreaks itforcing it to exfiltrate sensitive data and replicate its input as output, thus transmitting the same instructions to subsequent hosts.
The researchers also showed how contradictory a suggestion can be encoded in an image with a similar effect, forcing the email assistant to forward the poisoned image to new hosts. With either of these methods, an attacker could automatically propagate spam, propaganda, malware payloads, and further malicious instructions through a continuous chain of AI-embedded systems.
New malware, old problem
Most of today’s most advanced threats to AI models are just new versions of the oldest security problems in computing.
“While it is tempting to see these as existential threats, in terms of threat they are no different to the use of SQL injection and similar injection attackswhere attackers abuse text input spaces to insert additional commands or queries into seemingly sanitized input,” says Andrew Bolster, senior research and development manager for data science at Synopsys. “As the research points out, this is a 35-year-old idea that still has legs (older actually; the father of modern computer theory John Von Neumann theorized this in the 1950s and 1960s).”
Part of what made the Morris worm novel in its time three decades ago was that it figured out how to move data space into the part of the computer that exercises controls, allowing a Cornell graduate student to escape the confines of a normal user. and influence what a targeted computer does.
“A core of computer architecture, as long as computers have existed, has been this conceptual overlap between data space and control space: the control space is made up of the program instructions that are followed and therefore the data that are ideally in a controlled zone format,” explains Bolster.
Smart hackers today use GenAI suggestions to largely the same effect. And so, just like software developers before them, to defend themselves, AI developers will need a way to ensure that their programs don’t confuse user input with machine output. Developers can offload some of this responsibility to API rules, but a deeper solution could involve breaking down the AI models themselves into constituent parts. This way, data and control don’t live side by side in the same big house.
“We’re really starting to work on: how do we move from this all-in-one approach to a more distributed agent approach,” Bolster says. “If you really want to look at it, this is kind of analogous to moving the microservices architecture away from one big monolith. With everything in a services architecture, you’re able to put runtime content gateways between and around different services. So you can as a system operator can ask “Why does my email agent express things like images?” and impose constraints.”