Hardening AI training datasets against malicious poisoning

COMMENT

Picture this: It’s a Saturday morning and you’ve made breakfast for your family. The pancakes were golden brown and apparently tasted good, but everyone, including you, got sick shortly after eating them. Unbeknownst to you, the milk you used to make the batter expired several weeks ago. The quality of the ingredients affected the meal, but on the outside everything looked fine.

The same philosophy can be applied to artificial intelligence (AI). Regardless of its purpose, AI’s output is directly related to the quality of its input. As the popularity of AI continues to increase, security concerns surrounding data fed into AI are being called into question.

Most organizations today are integrating AI into business operations to some extent, and threat actors are taking note. In recent years, a tactic known as AI poisoning has become increasingly widespread. This new malicious practice involves inserting deceptive or malicious data into AI training sets. The tricky part about poisoning AI is that, despite the input being compromised, the output can initially continue normally. It is only when a threat actor manages to firmly control the data and begins a full-fledged attack that deviations from the norm become apparent. Consequences range from mild inconvenience to damage to a brand’s reputation.

It’s a risk that affects organizations of all sizes, even today’s largest technology vendors. For example, in recent years, hackers have launched several large-scale attacks to poison Google’s Gmail spam filters and even made Microsoft’s Twitter chatbot hostile.

Defend against AI data poisoning

Fortunately, organizations can take the following steps to protect AI technologies from potential poisoning.

  • Build a comprehensive data catalog. First, organizations should create a real-time data catalog that serves as a centralized repository of information that is fed to their AI systems. Whenever new data is added to AI systems, it should be tracked in this index. Furthermore, the catalog should be able to classify the data flowing into AI systems based on who, what, when, where, why and how to ensure transparency and accountability.

  • Develop a normal baseline for users and devices that interact with AI data. Once IT and security teams have a thorough understanding of all the data in AI systems and who has access to it, it is important to develop a baseline of normal user and device behavior.

Compromised credentials are one of the easiest ways for cybercriminals to penetrate networks. All a threat actor has to do is play a guessing game or buy one 24 billion username and password combinations available on the cybercriminal market. Once access is gained, a threat actor can easily access the AI ​​training datasets.

By establishing baseline user and device behavior, security teams can easily detect anomalies that could be indicative of an attack. Often, this helps stop a threat actor before an incident escalates into a full-blown data breach. For example, let’s say you have an IT executive who typically works from the New York office and who oversees AI data training sets. One day it turns out that he is active in another country and is adding large amounts of data to the artificial intelligence. If your security team already has a framework for user behavior, they can quickly figure out that this is anomalous behavior. Then security could speak to the executive and verify that he was performing the action or, if not, temporarily disable his account until the warning is thoroughly investigated to prevent further damage.

Take responsibility for AI training sets

Just as you should check the quality of ingredients before preparing a meal, it is critical to ensure the integrity of your AI training data. Artificial intelligence is closely linked to the quality of the data it processes. The implementation of improved guidelines, policies, monitoring systems, and algorithms plays a critical role in ensuring the safety and effectiveness of AI. These measures guard against potential threats and enable organizations to harness the transformative potential of AI. It’s a delicate balance where organizations must learn to leverage the capabilities of AI while remaining vigilant in the face of the ever-evolving threat landscape.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *