Azure High Performance Computing (HPC) Blog

5 MIN READ

Running DeepSeek-R1 on a single NDv5 MI300X VM

jesselopez

Microsoft

Feb 01, 2025

Contributors: Davide Vanzo, Yuval Mazor, Jesse Lopez

DeepSeek-R1 is an open-weights reasoning model built on DeepSeek-V3, designed for conversational AI, coding, and complex problem-solving. It has gained significant attention beyond the AI/ML community due to its strong reasoning capabilities, often competing with OpenAI’s models. One of its key advantages is that it can be run locally, giving users full control over their data.

The NDv5 MI300X VM features 8x AMD Instinct MI300X GPUs, each equipped with 192GB of HBM3 and interconnected via Infinity Fabric 3.0. With up to 5.2 TB/s of memory bandwidth per GPU, the MI300X provides the necessary capacity and speed to process large models efficiently - enabling users to run DeepSeek-R1 at full precision on a single VM.

In this blog post, we’ll walk you through the steps to provision an NDv5 MI300X instance on Azure and run DeepSeek-R1 for inference using the SGLang inference framework.

Launching an NDv5 MI300X VM

Prerequisites

Check that your subscription has sufficient vCPU quota for the VM family “StandardNDI Sv 5MI300X” (see Quota documentation).
If needed, contact your Microsoft account representative to request quota increase.
A Bash terminal with Azure CLI installed and logged into the appropriate tenant. Alternatively, Azure Cloud Shell can also be employed.

Provision the VM

1. Using Azure CLI, create an Ubuntu-22.04 VM on ND_MI300x_v5:

az group create --location <REGION> -n <RESOURCE_GROUP_NAME> 
az vm create --name mi300x --resource-group <RESOURCE_GROUP_NAME> --location <REGION> --image microsoft-dsvm:ubuntu-hpc:2204-rocm:22.04.2025030701 --size Standard_ND96isr_MI300X_v5 --security-type Standard --os-disk-size-gb 256 --os-disk-delete-option Delete --admin-username azureadmin --ssh-key-values <PUBLIC_SSH_PATH>

Optionally, the deployment can utilize the cloud-init.yaml file specified as --custom-data <CLOUD_INIT_FILE_PATH> to automate the additional preparation described below:

az vm create --name mi300x --resource-group <RESOURCE_GROUP_NAME> --location <REGION> --image microsoft-dsvm:ubuntu-hpc:2204-rocm:22.04.2025030701 --size Standard_ND96isr_MI300X_v5 --security-type Standard --os-disk-size-gb 256 --os-disk-delete-option Delete --admin-username azureadmin --ssh-key-values <PUBLIC_SSH_PATH> --custom-data <CLOUD_INIT_FILE_PATH>

Note: The GPU drivers may take a couple of mintues to completely load after the VM has been initially created.

Additional preparation

Beyond provisioning the VM, there are additional steps to prepare the environment to optimally run DeepSeed, or other AI workloads including setting-up the 8 NVMe disks on the node in a RAID-0 configuration to act as the cache location for Docker and Hugging Face.

The following steps assume you have connected to the VM and working in a Bash shell.

1. Prepare the NVMe disks in a RAID-0 configuration

mkdir -p /mnt/resource_nvme/
sudo mdadm --create /dev/md128 -f --run --level 0 --raid-devices 8 $(ls /dev/nvme*n1)  
sudo mkfs.xfs -f /dev/md128 
sudo mount /dev/md128 /mnt/resource_nvme 
sudo chmod 1777 /mnt/resource_nvme

2. Configure Hugging Face to use the RAID-0. This environmental variable should also be propagated to any containers pulling images or data from Hugging Face.

mkdir –p /mnt/resource_nvme/hf_cache 
export HF_HOME=/mnt/resource_nvme/hf_cache

3. Configure Docker to use the RAID-0

mkdir -p /mnt/resource_nvme/docker 
sudo tee /etc/docker/daemon.json > /dev/null <<EOF 
{ 
    "data-root": "/mnt/resource_nvme/docker" 
} 
EOF 
sudo chmod 0644 /etc/docker/daemon.json 
sudo systemctl restart docker

All of these additional preperation steps can be automated in VM creation using cloud-init. The example cloud-init.yaml file can be used in provisioning the VM as described above.

#cloud-config
package_update: true
write_files:
  - path: /opt/setup_nvme.sh
    permissions: '0755'
    owner: root:root
    content: |
      #!/bin/bash
      NVME_DISKS_NAME=`ls /dev/nvme*n1`
      NVME_DISKS=`ls -latr /dev/nvme*n1 | wc -l`

      echo "Number of NVMe Disks: $NVME_DISKS"

      if [ "$NVME_DISKS" == "0" ]
      then
          exit 0
      else
          mkdir -p /mnt/resource_nvme
          # Needed incase something did not unmount as expected. This will delete any data that may be left behind
          mdadm  --stop /dev/md*
          mdadm --create /dev/md128 -f --run --level 0 --raid-devices $NVME_DISKS $NVME_DISKS_NAME
          mkfs.xfs -f /dev/md128
          mount /dev/md128 /mnt/resource_nvme
      fi

      chmod 1777 /mnt/resource_nvme
  - path: /etc/profile.d/hf_home.sh
    permissions: '0755'
    content: |
      export HF_HOME=/mnt/resource_nvme/hf_cache
  - path: /etc/docker/daemon.json
    permissions: '0644'
    content: |
      {
        "data-root": "/mnt/resource_nvme/docker"
      }
runcmd:
  - ["/bin/bash", "/opt/setup_nvme.sh"]
  - mkdir -p /mnt/resource_nvme/docker
  - mkdir -p /mnt/resource_nvme/hf_cache
  # PAM group not working for docker group, so this will add all users to docker group
  - bash -c 'for USER in $(ls /home); do usermod -aG docker $USER; done'
  - systemctl restart docker

Using MI300X

If you are familiar with Nvidia and CUDA tools and environment, AMD provides equivalents as part of the ROCm stack.

MI300X + ROCm	Nvidia + CUDA	Description
rocm-smi	nvidia-smi	CLI for monitoring the system and making changes
rccl	nccl	Library for communication between GPUs

Running DeepSeek-R1

1. Pull the container image. It is O(10) GB in size, so it may take a few minutes to download.

docker pull rocm/sglang-staging:20250303

2. Start the SGLang server. The model (~642 GB) is downloaded the first time it is launched and will take at least a few minutes to download. Once the application outputs “The server is fired up and ready to roll!”, you can begin making queries to the model.

docker run \
  --device=/dev/kfd \
  --device=/dev/dri \
  --security-opt seccomp=unconfined \
  --cap-add=SYS_PTRACE \
  --group-add video \
  --privileged \
  --shm-size 32g \
  --ipc=host \
  -p 30000:30000 \
  -v /mnt/resource_nvme:/mnt/resource_nvme \
  -e HF_HOME=/mnt/resource_nvme/hf_cache \
  -e HSA_NO_SCRATCH_RECLAIM=1 \
  -e GPU_FORCE_BLIT_COPY_SIZE=64 \
  -e DEBUG_HIP_BLOCK_SYN=1024 \
  rocm/sglang-staging:20250303 \
  python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --tp 8 --trust-remote-code --host 0.0.0.0

3. You can now make queries to DeepSeek-R1. For example, these requests to the model from another shell on same host provide model data and will generate a sample response.

curl http://localhost:30000/get_model_info 
{"model_path":"deepseek-ai/DeepSeek-R1","tokenizer_path":"deepseek-ai/DeepSeek-R1","is_generation":true} 
curl http://localhost:30000/generate -H "Content-Type: application/json" -d '{ "text": "Once upon a time,", "sampling_params": { "max_new_tokens": 16, "temperature": 0.6 } }'

Conclusion

In this post, we detail how to run the full-size 671B DeepSeek-R1 model on a single Azure NDv5 MI300X instance. This includes setting up the machine, installing the necessary drivers, and executing the model. Happy inferencing!

References

https://github.com/deepseek-ai/DeepSeek-R1

https://github.com/deepseek-ai/DeepSeek-V3

https://www.amd.com/en/developer/resources/technical-articles/amd-instinct-gpus-power-deepseek-v3-revolutionizing-ai-development-with-sglang.html

https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/azure-announces-new-ai-optimized-vm-series-featuring-amd%e2%80%99s-flagship-mi300x-gpu/3980770

https://docs.sglang.ai/index.html

Updated Mar 10, 2025

Version 9.0

ai infrastructure

hpc

virtual machines