Blog Post

Microsoft Defender for Cloud Blog
4 MIN READ

Protecting Azure AI Workloads using Threat Protection for AI in Defender for Cloud

singhabhi's avatar
singhabhi
Icon for Microsoft rankMicrosoft
Feb 18, 2025

This article describes how you can build a Generative AI Threat Protection program using the out of the box capabilities provided by Defender for Cloud. It is important to consider the security of your entire workload like we mentioned under https://techcommunity.microsoft.com/t5/microsoft-defender-for-cloud/securing-multi-cloud-gen-ai-workloads-using-azure-native/ba-p/4222728. In this article we will focus primarily on the Gen AI specific threats. We will take an example of a common threat like Jailbreaking and showcase end to end how Microsoft’s native solutions help you detect and prevent. You can enroll in the preview by following the steps https://learn.microsoft.com/en-us/azure/defender-for-cloud/ai-onboarding#enroll-in-the-limited-preview

Understanding Jailbreak attacks

Evasion attacks involve subtly modifying inputs (images, audio files, documents, etc.) to mislead models at inference time, making them a stealthy and effective means of bypassing inherent security controls in the AI Service.

Jailbreak can be considered a type of evasion attack. The attack involves crafting inputs that cause the AI model to bypass its safety mechanisms and produce unintended or harmful outputs.

Attackers can use techniques like crescendo to bypass security filters for example creating a recipe for Molotov Cocktail. Due to the nature of working with human language, generative capabilities, and the data used in training the models, AI models are non-deterministic, i.e., the same input will not always produce the same outputs.

  • A “classic” jailbreak happens when an authorized operator of the system crafts jailbreak inputs in order to extend their own powers over the system.
  • Indirect prompt injection happens when a system processes data controlled by a third party (e.g., analyzing incoming emails or documents editable by someone other than the operator) who inserts a malicious payload into that data, which then leads to a jailbreak of the system.

 

There are various types of jailbreak-like attacks. Some, like DAN, involve adding instructions to a single user input, while others, like Crescendo, operate over multiple turns, gradually steering the conversation towards a specific outcome. Therefore, jailbreaks should be seen not as a single technique but as a collection of methods where a guardrail can be circumvented by a carefully crafted input.

 Understanding Native protections against Jailbreak

 

Defender for Cloud’s AI Threat Protection (https://learn.microsoft.com/en-us/azure/defender-for-cloud/ai-threat-protection) feature integrates with Azure Open AI and reviews the prompt and response for suspicious behavior (https://learn.microsoft.com/en-us/azure/defender-for-cloud/alerts-ai-workloads)

In case of Jailbreak, the solution integrates with Azure Open AI’s Content Filter Prompt Shields (https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/content-filter), which uses an ensemble of multi-class classification models to detect four categories of harmful content (violence, hate, sexual, and self-harm) at four severity levels respectively (safe, low, medium, and high), and optional binary classifiers for detecting jailbreak risk, existing text, and code in public repositories.

When Prompt Shield detects a Jailbreak attempt, it filters / annotate the user’s prompt. Defender for Cloud then picks up this information and makes it available to the security teams.

Note that User Prompts are protected from Direct Attacks like Jailbreak by default. As a result, once you enable Threat Protection for AI in Defender for Cloud your security teams will have complete visibility on these.

 

 

 

Fig 1. Threat Protection for AI alert

 

Tangible benefits for your Security Teams

Since the Defender for Cloud is doing the undifferentiated heavy lifting here your Security Governance, Architecture, and Operations all benefit like so,

Governance

  • Content is available out of the box and is enabled by default in several critical risk scenarios. This helps meet your AI security controls like OWASP LLM 01: Prompt Injection (https://genai.owasp.org/llmrisk/llm01-prompt-injection/)
  • You can further refine the Content Filter levels for each model running in AI Foundry depending on the risk such as the data model accesses (RAG), public exposure, etc.
  • The application of the control is enabled by default
  • The Control reporting is available out of the box and can/will follow the existing workflow that you have set up for remainder of your cloud workloads
  • Defender for Cloud provides Governance Framework

 

Architecture

  • Threat Protection for AI can be enabled at subscription level so the service scales with your workloads and provides coverage for any new deployments
  • There is native integration with Azure Open AI so you do not need to write and manage custom patterns unlike a third party service
  • The service is not in-line so you do not have to worry about downstream impact on the workload
  • Since Threat Protection for AI is a capability within Defender for Cloud, you do not need to define specific RBAC permissions for users or service
  • The alerts from the capability will automatically follow the export flow you have set up for the rest of the Defender for Cloud capabilities.

Operations

  • The alerts are already ingested in the Microsoft XDR portal so you can continue threat hunting without learning new tools there by maximizing your existing skills
  • You can set up Workflow Automation to respond to AI alerts much like alerts from other capabilities like Defender for Storage. So, your overall logic app patterns can be reused with small tweaks
  • Since your SOC analyst might still be learning Gen AI threats and your playbooks might not be up to date, the alerts (see Fig 1 above) contain steps that they should take to resolve
  • The alerts are available in XDR portal, which you might already be familiar with so won’t have to learn a new solution

 

 

Fig 2. Alerts in XDR Portal

  • The alerts contain the prompt as an evidence in addition to other relevant attributes like IP, user details, targeted resource. This helps you quickly triage the alerts

 

 

 

Fig 3. Prompt Evidence captured as part of the alert

  • You can train the model using the detected prompts to block any future responses on similar user prompts

 

Summary

Threat Protection for AI:

Published Feb 18, 2025
Version 1.0
No CommentsBe the first to comment