DeepSeek-R1 on Azure Container Apps Serverless GPUs

Microsoft

Jan 29, 2025

The world of AI is evolving at a breakneck pace with new models constantly being created. With so much rapid innovation, it is essential to have the flexibility to quickly adapt applications to the latest models. This is where Azure Container Apps serverless GPUs come in.

Azure Container Apps is a managed serverless container platform that enables you to deploy and run containerized applications while reducing infrastructure management and saving costs.

With serverless GPU support, you get the flexibility to bring any containerized workload, including new language models, and deploy them to a platform that automatically scales with your customer demand. In addition, you get optimized cold start, per-second billing and reduced operational overhead to allow you to focus on the core components of your applications when using GPUs. All the while, you can run your AI applications alongside your non-AI apps on the same platform, within the same environment, which shares networking, observability, and security capabilities.

DeepSeek-R1 and Ollama

DeepSeek-R1 is a model released just last week and has quickly gotten traction in the AI space. In this blog post, we’ll go over how you can easily deploy DeepSeek-R1 using Azure Container Apps serverless GPUs and Ollama. This guide will showcase how to deploy DeepSeek-R1, but the same steps apply for any model that you can find in Ollama's library.

Prerequisites

An Azure account with an active subscription.
- If you don't have one, you can create one for free.
Serverless GPU quota for Azure Container Apps. Request quota here.
Note: All EA customers will have 1 T4 GPU by default. In the future, all EA customers will also have 1 A100 by default and all paying customers will get 1 T4 GPU by default.

Deploy Azure Container Apps resources

Go to the Azure Portal.
Click Create a resource.
Search for Azure Container Apps.
Select Container App and Create.
On the Basics tab, you can leave most of the defaults. For region, select West US 3, Australia East, or Sweden Central. These are the regions Azure Container Apps serverless GPUs are supported.

In the Container tab, fill in the following details. The container that will be deployed has Ollama and Open WebUI bundled together. For more details on the container.

Field	Value
Image source	Docker hub or other registries
Image type	public
Registry login server	ghcr.io
Image and tag	open-webui/open-webui:ollama
Workload profile	Consumption
GPU (preview)	Check the box
GPU Type	T4 (Note: A100 GPUs are also supported, but for this guide, we'll be using T4 GPUs.)

*If you don't have quota for serverless GPUs in Azure Container Apps, request quota here.

In the Ingress tab, fill in the following details:

Field Value

Ingress Enabled

Ingress traffic Accepting traffic from anywhere

Target port 8080
Select Review + Create at the bottom of the page, then select Create.

Access Ollama Web UI

Once your deployment is complete, select Go to resource.
Select the Application Url for your container app. This will launch the container.

Note: You will see a higher cold start for this tutorial. Cold start for Azure Container Apps serverless GPUs is optimized when using Azure Container Registry (ACR) with artifact streaming enabled. In order to get faster cold start times, you can pull the image into your own ACR and enable artifact streaming. Steps are here.

Use DeepSeek-R1

Once your container starts up, follow the prompts to get started.
You will end up on a page that looks like the below. Click on Select a model in the top left corner. Enter deepseek-r1:14b into the search box. This is the 14 billion parameter Qwen model. Alternatively, you can use the Llama based models such as deepseek-r1:8b. For a full list of models available, see the Ollama library.
Select Pull "deepseek-r1:14b" from Ollama.com. You will be updated on the progress of downloading the model.
Once downloaded, select the top left box for Select a model again, and select deepseek-r1:14b. Your page should look like the below.
Use the central chat box to begin using the model. An example prompt is: What are the benefits of Azure Container Apps?

Congratulations!

You have now successfully gotten up and running with DeepSeek-R1 on Azure Container Apps! As mentioned previously, the same steps apply for any model that you can find in Ollama's library. In addition, Azure Container Apps is a completely agnostic compute platform. You can bring any Linux-based container for your AI workloads and run them on serverless GPUs.

Please comment below to let us know what you think of the experience and any AI workloads you're deploying to Azure Container Apps.

Next Steps

Azure Container Apps is fully ephemeral and doesn't have a mounted storage. In order to persist your data and conversations, you can add a volume mount to your Azure Container App. For steps on how to add a volume mount, follow steps here.

In order to get faster cold start times, you can also pull the image into your own ACR and enable artifact streaming. See the tutorial for enabling artifact streaming.

Updated Jan 29, 2025

Version 6.0

Microsoft

Joined July 06, 2022

View Profile