Microsoft releases PyRIT, a red teaming tool for generative artificial intelligence

February 23, 2024PressroomRed Teaming/Artificial Intelligence

Generative AI

Microsoft has released an open-access automation framework called PyRIT (short for Python Risk Identification Tool) to proactively identify risks in generative artificial intelligence (AI) systems.

The red teaming tool is designed to “empower every organization around the world to responsibly innovate with the latest advances in artificial intelligence,” said Ram Shankar Siva Kumar, head of Microsoft’s AI red team.

The company said PyRIT could be used to evaluate the robustness of large language model (LLM) endpoints against different categories of harm such as fabrication (e.g., hallucinations), misuse (e.g., bias) and prohibited content (e.g., harassment).

It can also be used to identify security harms ranging from malware generation to jailbreaking, as well as privacy harms such as identity theft.

Cyber ​​security

PyRIT comes with five interfaces: target, dataset, scoring engine, ability to support multiple attack strategies, and incorporate a memory component that can take the form of JSON or a database to store intermediate input and output interactions .

The scoring engine also offers two different options for scoring outputs from the target AI system, allowing red teams to use a classic machine learning classifier or leverage an LLM endpoint for self-assessment.

“The goal is to allow researchers to have a baseline of how effective their model and entire inference pipeline is across different categories of harm, and to be able to compare that baseline to future iterations of their model,” Microsoft said.

Generative AI

“This allows them to have empirical data on how their model is currently performing and to detect any performance degradation based on future improvements.”

That said, the tech giant is careful to point out that PyRIT does not replace manual red teaming of generative AI systems, and that it complements a red team’s existing domain expertise.

In other words, the tool aims to highlight risk “hot spots” by generating suggestions that could be used to evaluate the AI ​​system and flag areas that require further investigation.

Cyber ​​security

Microsoft also recognized that red teaming generative AI systems requires probing both security and responsible AI risks simultaneously and that the exercise is more probabilistic, while also highlighting the wide differences in AI system architectures artificial generative.

“Manual probing, while time-consuming, is often necessary to identify potential blind spots,” Siva Kumar said. “Automation is necessary for scalability but does not replace manual probing.”

The development comes as Protect AI has revealed numerous critical vulnerabilities in popular AI supply chain platforms such as ClearML, Hugging Face, MLflow and Triton Inference Server that could lead to arbitrary code execution and disclosure of sensitive information.

Did you find this article interesting? Follow us on Twitter and LinkedIn to read the most exclusive content we publish.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *