Up to 100 malicious artificial intelligence (AI)/machine learning (ML) models have been discovered on the Hugging Face platform.
These include cases where loading a pickle file leads to code execution, software supply chain security firm JFrog said.
“The model’s payload grants the attacker a shell on the compromised machine, allowing him to gain full control over victim machines through what is commonly referred to as a ‘backdoor,’” said senior security researcher David Cohen.
“This silent infiltration could potentially grant access to critical internal systems and pave the way for large-scale data breaches or even corporate espionage, affecting not just individual users but potentially entire organizations around the world, all while leaving victims completely unaware of their compromised status.”
Specifically, the rogue model initiates a reverse shell connection to 210.117.212[.]93, an IP address belonging to the Korea Research Environment Open Network (KREONET). Other repositories with the same payload were observed connecting to other IP addresses.
In one case, the model’s authors urged users not to download it, raising the possibility that the publication could be the work of AI researchers or professionals.
“However, a fundamental principle in security research is to refrain from publishing real exploits or malicious code,” JFrog said. “This principle was violated when the malicious code attempted to reconnect to a genuine IP address.”
The findings once again highlight the threat lurking in open source repositories, which could be poisoned by nefarious activities.
From supply chain risks to zero-click worms
They also come as researchers have devised efficient ways to generate suggestions that can be used to elicit malicious responses from large language models (LLMs) using a technique called ray search-based adversarial attack (BEAST).
In a related development, security researchers have developed what is known as a generative AI worm called Morris II, capable of stealing data and spreading malware across multiple systems.
Morris II, a twist on one of the oldest computer worms, exploits self-replicating adversary suggestions encoded in inputs such as images and text which, when processed by GenAI models, can cause them to “replicate the input as output (replication) and engage in malicious activity (payload),” said security researchers Stav Cohen, Ron Bitton and Ben Nassi.
Even more concerning, models can be weaponized to provide malicious input to new applications by exploiting connectivity within the generative AI ecosystem.
The attack technique, called ComPromptMized, shares similarities with traditional approaches such as buffer overflow and SQL injection as it embeds code within a query and data in regions known to contain executable code.
ComPromptMized impacts applications whose execution flow depends on the output of a generative AI service, as well as those that use Retrieval Augmented Generation (RAG), which combines text generation models with a text retrieval component. information to enrich the answers to queries.
The study is not the first, nor will it be the last, to explore the idea of prompt injection as a way to attack LLMs and cause them to take unintentional actions.
Previously, academics have demonstrated attacks that use images and audio recordings to inject invisible “adversarial perturbations” into multimodal LLMs that cause the model to generate text or instructions of the attacker’s choosing.
“The attacker can lure the victim to a web page with an interesting image or send an email with an audio clip,” Nassi, along with Eugene Bagdasaryan, Tsung-Yin Hsieh and Vitaly Shmatikov, said in an article released late last year.
“When the victim directly inserts the image or clip into an isolated LLM and asks questions about it, the model will be guided by the instructions inserted by the attacker.”
Early last year, a group of researchers from the German CISPA Helmholtz Center for Cyber Security at Saarland University and Sequire Technology also discovered how an attacker could exploit LLM models by strategically inserting hidden prompts into the data (i.e., indirect prompt injection) that the model would likely retrieve when responding to user input.