Blog Post

Apps on Azure Blog
4 MIN READ

DeepSeek-R1 on Azure Container Apps Serverless GPUs

Cary_Chai's avatar
Cary_Chai
Icon for Microsoft rankMicrosoft
Jan 29, 2025

The world of AI is evolving at a breakneck pace with new models constantly being created. With so much rapid innovation, it is essential to have the flexibility to quickly adapt applications to the latest models. This is where Azure Container Apps serverless GPUs come in.

Azure Container Apps is a managed serverless container platform that enables you to deploy and run containerized applications while reducing infrastructure management and saving costs.

With serverless GPU support, you get the flexibility to bring any containerized workload, including new language models, and deploy them to a platform that automatically scales with your customer demand. In addition, you get optimized cold start, per-second billing and reduced operational overhead to allow you to focus on the core components of your applications when using GPUs. All the while, you can run your AI applications alongside your non-AI apps on the same platform, within the same environment, which shares networking, observability, and security capabilities.

DeepSeek-R1 and Ollama

DeepSeek-R1 is a model released just last week and has quickly gotten traction in the AI space. In this blog post, we’ll go over how you can easily deploy DeepSeek-R1 using Azure Container Apps serverless GPUs and Ollama. This guide will showcase how to deploy DeepSeek-R1, but the same steps apply for any model that you can find in Ollama's library.

Prerequisites

  • An Azure account with an active subscription.
  • Serverless GPU quota for Azure Container Apps. Request quota here.
    Note: All EA customers will have 1 T4 GPU by default. In the future, all EA customers will also have 1 A100 by default and all paying customers will get 1 T4 GPU by default.

Deploy Azure Container Apps resources

  1. Go to the Azure Portal.

  2. Click Create a resource.

  3. Search for Azure Container Apps.

  4. Select Container App and Create.

  5. On the Basics tab, you can leave most of the defaults. For region, select West US 3, Australia East, or Sweden Central. These are the regions Azure Container Apps serverless GPUs are supported.

  6. In the Container tab, fill in the following details. The container that will be deployed has Ollama and Open WebUI bundled together. For more details on the container.
    Field Value
    Image source  Docker hub or other registries
    Image type public
    Registry login server ghcr.io
    Image and tag open-webui/open-webui:ollama
    Workload profile Consumption
    GPU (preview)

    Check the box

    GPU Type

    T4

    (Note: A100 GPUs are also supported, but for this guide, we'll be using T4 GPUs.)

    *If you don't have quota for serverless GPUs in Azure Container Apps, request quota here

  7. In the Ingress tab, fill in the following details:
    Field Value
    Ingress Enabled
    Ingress traffic Accepting traffic from anywhere
    Target port 8080


  8. Select Review + Create at the bottom of the page, then select Create.

Access Ollama Web UI

  1. Once your deployment is complete, select Go to resource.

  2. Select the Application Url for your container app. This will launch the container.

    Note: You will see a higher cold start for this tutorial. Cold start for Azure Container Apps serverless GPUs is optimized when using Azure Container Registry (ACR) with artifact streaming enabled. In order to get faster cold start times, you can pull the image into your own ACR and enable artifact streaming. Steps are here.

Use DeepSeek-R1

  1. Once your container starts up, follow the prompts to get started.

  2. You will end up on a page that looks like the below. Click on Select a model in the top left corner. Enter deepseek-r1:14b into the search box. This is the 14 billion parameter Qwen model. Alternatively, you can use the Llama based models such as deepseek-r1:8b. For a full list of models available, see the Ollama library.

     

  3. Select Pull "deepseek-r1:14b" from Ollama.com. You will be updated on the progress of downloading the model.

  4. Once downloaded, select the top left box for Select a model again, and select deepseek-r1:14b. Your page should look like the below.

     

  5. Use the central chat box to begin using the model. An example prompt is: What are the benefits of Azure Container Apps?

Congratulations!

You have now successfully gotten up and running with DeepSeek-R1 on Azure Container Apps! As mentioned previously, the same steps apply for any model that you can find in Ollama's library. In addition, Azure Container Apps is a completely agnostic compute platform. You can bring any Linux-based container for your AI workloads and run them on serverless GPUs.

Please comment below to let us know what you think of the experience and any AI workloads you're deploying to Azure Container Apps. 

Next Steps

Azure Container Apps is fully ephemeral and doesn't have a mounted storage. In order to persist your data and conversations, you can add a volume mount to your Azure Container App. For steps on how to add a volume mount, follow steps here.

In order to get faster cold start times, you can also pull the image into your own ACR and enable artifact streaming. See the tutorial for enabling artifact streaming.

Updated Jan 29, 2025
Version 6.0
  • Great sharing Cary_Chai 
    Wanted to check whether the similar solution will also apply to AKS running on Azure Local with dedicated GPU installed ?

  • Ekurt's avatar
    Ekurt
    Copper Contributor

    Thx Cary_Chai for this documentation, 

    how can I ensure that I don't have to reinstall all models after restarting the container app? 
    I am currently facing this problem (I have added a storage account as volume).

  • juwu's avatar
    juwu
    Copper Contributor

    Can we use the 671b model? What GPU option do we need to use if we want to try the real model?

  • What is the exact cost for deepseek on ACA? Just like ordinary TPU enabled ACA costing?

  • Hi aigeeksree, this blog post doesn't use Azure AI Foundry to run DeepSeek-R1. Instead, it outlines how to host your own DeepSeek-R1 model on Azure Container Apps serverless GPUs. We have capacity for serverless GPUs if you want to try out the blog post steps. However, I am unaware of the state of Azure AI Foundry capacity.

  • aigeeksree's avatar
    aigeeksree
    Copper Contributor

    As on Jan 30th 11.30 AM, Azure AI Foundry playground DS R1 chat doesn't respond. Looks frozen. Is it out of capacity?