AI LLM attacks & how – Part 2

Here’s the second part of my presentation in Microsoft AI Summit Finland on October 31st, 2024.

And some extra since this second part is so late. I blame finnish Hell, I mean the endless darkness from October to the end of January.

Protect AI workloads against threats

So what does it mean? There are different tools/applications to see what might happen to your AI applications and how they can protected proactively not reactively. Of course that can be done too but in worst scenario that is too late.

Security teams can now detect new threats to AI workloads that use Azure OpenAI Service
This includes threats such as prompt injection attacks, credential theft, and sensitive data exfiltration
Enriched security alerts provide insights into IP address, malicious prompt evidence that triggered the alert, as well as details on sensitive information types or credential accessed
In addition, correlate them into incidents in Defender XDR or integrate into their SIEM solution of choice

Threat protection for AI workloads in Azure

Microsoft Defender for Cloud has a new plan for AI workloads.

Threat protection for AI workloads in Azure involves a combination of security measures, services, and best practices that work together to safeguard data, models, and infrastructure.

Azure AI workloads threat protection can detect and respond to attacks, such as jailbreak attacks, sensitive data exposure, and credential theft.

It can also receive contextual security alerts with supporting evidence including:

IP address
Malicious prompt evidence
Sensitive information types or credentials accessed.

To enable threat protection for AI workloads:

Sign in to the Azure portal.
Search for and select Microsoft Defender for Cloud.
In the Defender for Cloud menu, select Environment settings.
Select the relevant Azure subscription.
On the Defender plans page, toggle the AI workloads to On.

Also with the AI workloads threat protection plan enabled, you can control whether alerts include suspicious segments directly from your user’s prompts, or the model responses from your AI applications or resources. Enabling user prompt evidence helps you to triage and classify alerts and your user’s intentions.

User prompt evidence consists of prompts and model responses. Both are considered your data. Evidence is available through the Azure portal, Defender portal, and any attached partners integrations.

Sign in to the Azure portal.
Search for and select Microsoft Defender for Cloud.
In the Defender for Cloud menu, select Environment settings.
Select the relevant Azure subscription.
Locate AI workloads and select Settings.
Toggle Enable user prompt evidence to On.

Here’s how threat protection typically works for AI workloads in Azure:

Defender for Cloud continuously monitors Azure resources and provides security recommendations and alerts.

It uses advanced analytics and machine learning to detect threats, such as unusual activity patterns that could indicate a security breach.

Monitoring and Logging: Azure Monitor and Azure Log Analytics provide visibility into the health, performance, and security of your AI workloads. They enable you to collect logs and metrics from your resources, facilitating proactive threat detection and troubleshooting.

Below is an example of AI Alert which can see from Defender for Cloud:

Threat protection for AI workloads in Defender XDR

Threat protection for AI workloads integrates with Defender XDR, enabling security teams to centralize alerts on AI workloads within the Defender XDR portal.

Correlate and contextualize malicious events into robust security incidents with XDR and SIEM tool integrations.

Availibility at the moment is limited. You can request access limited public preview program.

Summary for AI Threat Protection

Defender for Cloud enables organizations to securely develop, deploy and run Gen-AI applications across the applications lifecycle and multiple cloud environments.

Security Posture for AI – agentless ability to discover and reduce risks to GenAI enterprise-built applications in Azure and AWS from code to cloud using new recommendations, attack path analysis and Infrastructure as code scanning (shift left) to discover AI related misconfigurations and vulnerable code repositories.

Threat Protection for AI – agentless ability to detect and respond to attacks targeting Gen-AI cloud native applications in Azure with seamless integration with Azure AI, Microsoft Threat Intelligence and Defender XDR.

Defender for Cloud mission is to empower organizations to secure the entire lifecycle of their cloud native applications.

Azure AI Foundry

Azure AI Foundry – Safety and security measures help AI systems avoid harmful content, bias, misuse, and unintended risks. Prioritizing safety and security empowers you to build AI solutions that your enterprise and your customers can trust. https://ai.azure.com/

It is designed for developers to:

Build generative AI applications on an enterprise-grade platform.
Explore, build, test, and deploy using cutting-edge AI tools and ML models, grounded in responsible AI practices.
Collaborate with a team for the full life-cycle of application development.

Azure AI Foundry has a possibility to create Content Filters. It works by running both the prompt input and completion output through an ensemble of classification models aimed at detecting and preventing the output of harmful content.

Uses cases for software developers

User prompts submitted to a generative AI service.
Content produced by generative AI models.
Online marketplaces that moderate product catalogs and other user-generated content.
Gaming companies that moderate user-generated game artifacts and chat rooms.
Social messaging platforms that moderate images and text added by their users.
Enterprise media companies that implement centralized moderation for their content.
K-12 education solution providers filtering out content that is inappropriate for students and educators.

Azure AI Content Safety Studio

Defender for Cloud’s AI threat protection works with Azure AI Content Safety Prompt Shields and Microsoft’s threat intelligence to provide security alerts for threats like data leakage, data poisoning, jailbreak, and credential theft.

Azure AI Content Safety Studio is an own web site, https://contentsafety.cognitive.azure.com/

It is an online tool designed to handle potentially offensive, risky, or undesirable content using cutting-edge content moderation ML models. It provides templates and customized workflows, enabling users to choose and build their own content moderation system. Users can upload their own content or try it out with provided sample content.

Content Safety Studio not only contains out-of-the-box AI models but also includes Microsoft’s built-in terms blocklists to flag profanities and stay up to date with new content trends. You can also upload your own blocklists to enhance the coverage of harmful content that’s specific to your use case.

Product templates and use-cases

Azure AI Content Safety product templates are described here. These links take you to Microsoft Learn pages.

Prompt Shields

Prompt Shields is a technique designed to enhance the safety and reliability of AI language models. It involves creating a series of defensive mechanisms, like filters and checks, that are applied to the prompts and outputs of an AI model. These mechanisms aim to prevent the model from generating harmful, biased, or otherwise undesirable content.

Groundedness detection (preview)

Groundedness detection in the context of AI language models refers to the ability of the model to evaluate whether its responses are based on factual, reliable, and verifiable information. This is particularly important for applications where accuracy and trustworthiness are crucial, such as in education, healthcare, or news dissemination.

Protected material detection (preview)

Protected material detection refers to the process of identifying and managing content that is legally or ethically sensitive, such as copyrighted, confidential, or private information. In the context of AI and machine learning, this involves ensuring that AI models do not inadvertently generate or replicate such material, which could lead to legal issues or breaches of privacy.

Custom categories (preview)

Azure AI Content Safety lets you create and manage your own content moderation categories for enhanced moderation and filtering that matches your specific policies or use cases.

Harm categories

Here are some common harm categories: (more on MS learn)

Race, ethnicity, nationality
Gender identity groups and expression
Sexual orientation
Religion
Personal appearance and body size
Disability status
Harassment and bullying

Secure App Development

Microsoft provides comprehensive code to runtime security.

Starting with secure development, we have native security controls infused in the existing developer workflows to help you code and build applications securely.

With contextual posture management, we help you prioritize and reduce risk continuously across the entire cloud application lifecycle.

Protect your clouds against evolving threats with near real-time detections for cloud and AI workloads, data and APIs in a unified XDR experience where you can enable correlation and advanced Copilot powered response actions across your entire digital estate.

AppSec = Application Security (code scanning, supply chain security)

CI/CD security = security for the developer and DevOps environments

CSPM = Cloud security posture management

CIEM = Cloud identity entitlement management

CWP = Cloud workload protection

CDR = Cloud detection and response

SDL = Security Development Lifecycle

Here (below) we can see how AI-Security Posture Management can provide visibility into your GenAI stack and your connected resources to reduce misconfiguration that can expose sensitive grounding data

Microsoft AI RED Teaming & Tools

Microsoft AI red teaming refers to the practice of emulating real-world adversaries and their tools, tactics, and procedures to identify risks, uncover blind spots, validate assumptions, and improve the overall security posture of systems. AI systems inherit new security vulnerabilities, such as prompt injection and poisoning, which need special attention. AI Red teaming is a best practice in the responsible development of systems and features using LLMs.

AI Red teamers help to uncover and identify harms and, in turn, enable measurement strategies to validate the effectiveness of mitigations.

AI red teaming is now an umbrella term for both security and Responsible AI (RAI).

Open automation framework, PyRIT can help automate the process of identifying risks in AI systems.

Bug Bar – Vulnerability Severity Classification

Threat Modeling AI/ML Systems and Dependencies

Adversarial Machine Learning Threat Taxonomy

AI Risk Assessment for ML Engineers – Find out the severity levels in ML models

Security Copilot

Then there is Security Copilot. I will cover this more on coming blog which is a refresher for my previous one. Here’s some use cases what you can do with it:

Incident summarization – Gain context for incidents and improve communication across your organization by leveraging generative AI to swiftly distill complex security alerts into concise, actionable summaries
Impact analysis – Utilize AI-driven analytics to assess the potential impact of security incidents, offering insights into affected systems and data to prioritize response efforts effectively.
Reverse engineering of scripts – Eliminate the need to manually reverse engineer malware and enable every analyst to understand the actions executed by attackers.
Analyze complex command line scripts and translate them into natural language with clear explanations of actions. Efficiently extract and link indicators found in the script to their respective entities in your environment.
Guided response – Receive actionable step-by-step guidance for incident response, including directions for triage, investigation, containment, and remediation. Relevant deep links to recommended actions allow for quicker response.

Useful links

AI & Machine Learning protection posts on Microsoft blog

How Microsoft discovers and mitigates evelving attacks against AI guardrails

Alerts for AI workloads

Redteaming Large Language models (LLMs)

PyRIT Framework blog

Guide for building AI red teams for LLMs

PyRIT – how to guide

What is Security Copilot

MS Learn training path for Security Copilot

Key take-aways

For infrastructure:

Secure your identity (with EntraID policies) – ”Identity is the new Firewall”
Enable security guardrails in your AI environment
Use AI Security Posture Management in Azure
Use Content Safety filters in Azure Open AI

For developers:

Require system developers to ensure that safety brakes are built by design into the use of AI systems for the control of infrastructure
Use Microsoft SDL (Security Development Lifecycle)
Use vulnerability scanners against code repositories

Summary

Year 2024 was definitely the year of AI. Yet there are practises that take some AI solution to production but does not secure it in any way. That can cost a lot. We security people need to secure them or at least give guidance to the company decision-makers that AI solutions MUST BE PROTECTED.

Here’s a list of some of the significant AI-related incidents and attacks:

Microsoft Tay (2016): Tay was an AI chatbot released by Microsoft on Twitter, designed to learn from interactions with users. However, it was quickly manipulated by users who fed it offensive and inappropriate content, leading Tay to produce racist and inflammatory tweets.
Deepfake Technology (2018-present): The rise of deepfake technology, which uses AI to create realistic fake videos, has led to numerous incidents where individuals have been impersonated, often in damaging ways. High-profile cases include deepfakes of political figures and celebrities.
Tesla Autopilot Incidents (Various Years): Tesla’s Autopilot system, an AI-driven driver assistance feature, has been involved in several accidents. Critics have pointed out that the system can be tricked or misused, leading to dangerous situations.
Adversarial Attacks on Image Recognition (2010s-present): Researchers have demonstrated that AI image recognition systems can be fooled by adversarial attacks, where slight, often imperceptible modifications to images lead the AI to make incorrect classifications.
Amazon Alexa and Google Home Privacy Concerns (2017-present): Incidents involving smart speakers like Amazon Alexa and Google Home have raised concerns about unintended recordings and data privacy. There have been reports of these devices capturing conversations without explicit activation.
GPT-3 Misuse (2020-present): OpenAI’s GPT-3, a powerful language model, has been used in various applications, some of which have raised ethical concerns. Misuse includes generating fake news articles, phishing emails, and other misleading content.
AI in Social Media Manipulation (2010s-present): AI-driven bots and algorithms have been used to manipulate social media platforms, spreading misinformation, and amplifying specific political agendas. Notable incidents include involvement in elections and public opinion manipulation.
Data Poisoning Attacks (Various Years): These attacks involve deliberately injecting malicious data into the training datasets of machine learning models, causing them to learn incorrect or harmful behaviors.
Facial Recognition Misidentification (2010s-present): There have been numerous reports of facial recognition technology misidentifying individuals, leading to wrongful arrests and privacy violations. This has raised concerns about bias and accuracy in AI systems.
AI in Autonomous Weapons (Ongoing Concerns): While not a specific incident, the potential misuse of AI in autonomous weapons systems has been a significant concern for ethicists and governments, leading to calls for regulation and bans on certain applications.

Share on Social Media

Discover more from Jussi Metso

Subscribe to get the latest posts sent to your email.

AI SECURITY

Table of Contents