This article describes how you can build a Generative AI Threat Protection program using the out of the box capabilities provided by Defender for Cloud. It is important to consider the security of your entire workload like we mentioned under https://techcommunity.microsoft.com/t5/microsoft-defender-for-cloud/securing-multi-cloud-gen-ai-workloads-using-azure-native/ba-p/4222728. In this article we will focus primarily on the Gen AI specific threats. We will take an example of a common threat like Jailbreaking and showcase end to end how Microsoft’s native solutions help you detect and prevent. You can enroll in the preview by following the steps https://learn.microsoft.com/en-us/azure/defender-for-cloud/ai-onboarding#enroll-in-the-limited-preview
Understanding Jailbreak attacks
Evasion attacks involve subtly modifying inputs (images, audio files, documents, etc.) to mislead models at inference time, making them a stealthy and effective means of bypassing inherent security controls in the AI Service.
Jailbreak can be considered a type of evasion attack. The attack involves crafting inputs that cause the AI model to bypass its safety mechanisms and produce unintended or harmful outputs.
Attackers can use techniques like crescendo to bypass security filters for example creating a recipe for Molotov Cocktail. Due to the nature of working with human language, generative capabilities, and the data used in training the models, AI models are non-deterministic, i.e., the same input will not always produce the same outputs.
- A “classic” jailbreak happens when an authorized operator of the system crafts jailbreak inputs in order to extend their own powers over the system.
- Indirect prompt injection happens when a system processes data controlled by a third party (e.g., analyzing incoming emails or documents editable by someone other than the operator) who inserts a malicious payload into that data, which then leads to a jailbreak of the system.
There are various types of jailbreak-like attacks. Some, like DAN, involve adding instructions to a single user input, while others, like Crescendo, operate over multiple turns, gradually steering the conversation towards a specific outcome. Therefore, jailbreaks should be seen not as a single technique but as a collection of methods where a guardrail can be circumvented by a carefully crafted input.
Understanding Native protections against Jailbreak
Defender for Cloud’s AI Threat Protection (https://learn.microsoft.com/en-us/azure/defender-for-cloud/ai-threat-protection) feature integrates with Azure Open AI and reviews the prompt and response for suspicious behavior (https://learn.microsoft.com/en-us/azure/defender-for-cloud/alerts-ai-workloads)
In case of Jailbreak, the solution integrates with Azure Open AI’s Content Filter Prompt Shields (https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/content-filter), which uses an ensemble of multi-class classification models to detect four categories of harmful content (violence, hate, sexual, and self-harm) at four severity levels respectively (safe, low, medium, and high), and optional binary classifiers for detecting jailbreak risk, existing text, and code in public repositories.
When Prompt Shield detects a Jailbreak attempt, it filters / annotate the user’s prompt. Defender for Cloud then picks up this information and makes it available to the security teams.
Note that User Prompts are protected from Direct Attacks like Jailbreak by default. As a result, once you enable Threat Protection for AI in Defender for Cloud your security teams will have complete visibility on these.
Fig 1. Threat Protection for AI alert
Tangible benefits for your Security Teams
Since the Defender for Cloud is doing the undifferentiated heavy lifting here your Security Governance, Architecture, and Operations all benefit like so,
Governance
- Content is available out of the box and is enabled by default in several critical risk scenarios. This helps meet your AI security controls like OWASP LLM 01: Prompt Injection (https://genai.owasp.org/llmrisk/llm01-prompt-injection/)
- You can further refine the Content Filter levels for each model running in AI Foundry depending on the risk such as the data model accesses (RAG), public exposure, etc.
- The application of the control is enabled by default
- The Control reporting is available out of the box and can/will follow the existing workflow that you have set up for remainder of your cloud workloads
- Defender for Cloud provides Governance Framework
Architecture
- Threat Protection for AI can be enabled at subscription level so the service scales with your workloads and provides coverage for any new deployments
- There is native integration with Azure Open AI so you do not need to write and manage custom patterns unlike a third party service
- The service is not in-line so you do not have to worry about downstream impact on the workload
- Since Threat Protection for AI is a capability within Defender for Cloud, you do not need to define specific RBAC permissions for users or service
- The alerts from the capability will automatically follow the export flow you have set up for the rest of the Defender for Cloud capabilities.
Operations
- The alerts are already ingested in the Microsoft XDR portal so you can continue threat hunting without learning new tools there by maximizing your existing skills
- You can set up Workflow Automation to respond to AI alerts much like alerts from other capabilities like Defender for Storage. So, your overall logic app patterns can be reused with small tweaks
- Since your SOC analyst might still be learning Gen AI threats and your playbooks might not be up to date, the alerts (see Fig 1 above) contain steps that they should take to resolve
- The alerts are available in XDR portal, which you might already be familiar with so won’t have to learn a new solution
Fig 2. Alerts in XDR Portal
- The alerts contain the prompt as an evidence in addition to other relevant attributes like IP, user details, targeted resource. This helps you quickly triage the alerts
Fig 3. Prompt Evidence captured as part of the alert
- You can train the model using the detected prompts to block any future responses on similar user prompts
Summary
Threat Protection for AI:
- Provides holistic coverage of your Gen AI workloads
- Helps you maximize the investment in Microsoft Solutions
- Reduces the need for learning another solution to protect another new workloads
- Drives overall cost, time, and operational efficiencies
- Enroll in the preview https://learn.microsoft.com/en-us/azure/defender-for-cloud/ai-onboarding#enroll-in-the-limited-preview
Published Feb 18, 2025
Version 1.0singhabhi
Microsoft
Joined January 23, 2023
Microsoft Defender for Cloud Blog
Follow this blog board to get notified when there's new activity