Contributors: Davide Vanzo, Yuval Mazor, Jesse Lopez
DeepSeek-R1 is an open-weights reasoning model built on DeepSeek-V3, designed for conversational AI, coding, and complex problem-solving. It has gained significant attention beyond the AI/ML community due to its strong reasoning capabilities, often competing with OpenAI’s models. One of its key advantages is that it can be run locally, giving users full control over their data.
The NDv5 MI300X VM features 8x AMD Instinct MI300X GPUs, each equipped with 192GB of HBM3 and interconnected via Infinity Fabric 3.0. With up to 5.2 TB/s of memory bandwidth per GPU, the MI300X provides the necessary capacity and speed to process large models efficiently - enabling users to run DeepSeek-R1 at full precision on a single VM.
In this blog post, we’ll walk you through the steps to provision an NDv5 MI300X instance on Azure and run DeepSeek-R1 for inference using the SGLang inference framework.
Launching an NDv5 MI300X VM
Prerequisites
- Check that your subscription has sufficient vCPU quota for the VM family “StandardNDI Sv 5MI300X” (see Quota documentation).
- If needed, contact your Microsoft account representative to request quota increase.
- A Bash terminal with Azure CLI installed and logged into the appropriate tenant. Alternatively, Azure Cloud Shell can also be employed.
Provision the VM
1. Using Azure CLI, create an Ubuntu-22.04 VM on ND_MI300x_v5:
az group create --location <REGION> -n <RESOURCE_GROUP_NAME>
az vm create --name mi300x --resource-group <RESOURCE_GROUP_NAME> --location <REGION> --image microsoft-dsvm:ubuntu-hpc:2204-rocm:22.04.2025030701 --size Standard_ND96isr_MI300X_v5 --security-type Standard --os-disk-size-gb 256 --os-disk-delete-option Delete --admin-username azureadmin --ssh-key-values <PUBLIC_SSH_PATH>
Optionally, the deployment can utilize the cloud-init.yaml file specified as --custom-data <CLOUD_INIT_FILE_PATH> to automate the additional preparation described below:
az vm create --name mi300x --resource-group <RESOURCE_GROUP_NAME> --location <REGION> --image microsoft-dsvm:ubuntu-hpc:2204-rocm:22.04.2025030701 --size Standard_ND96isr_MI300X_v5 --security-type Standard --os-disk-size-gb 256 --os-disk-delete-option Delete --admin-username azureadmin --ssh-key-values <PUBLIC_SSH_PATH> --custom-data <CLOUD_INIT_FILE_PATH>
Note: The GPU drivers may take a couple of mintues to completely load after the VM has been initially created.
Additional preparation
Beyond provisioning the VM, there are additional steps to prepare the environment to optimally run DeepSeed, or other AI workloads including setting-up the 8 NVMe disks on the node in a RAID-0 configuration to act as the cache location for Docker and Hugging Face.
The following steps assume you have connected to the VM and working in a Bash shell.
1. Prepare the NVMe disks in a RAID-0 configuration
mkdir -p /mnt/resource_nvme/
sudo mdadm --create /dev/md128 -f --run --level 0 --raid-devices 8 $(ls /dev/nvme*n1)
sudo mkfs.xfs -f /dev/md128
sudo mount /dev/md128 /mnt/resource_nvme
sudo chmod 1777 /mnt/resource_nvme
2. Configure Hugging Face to use the RAID-0. This environmental variable should also be propagated to any containers pulling images or data from Hugging Face.
mkdir –p /mnt/resource_nvme/hf_cache
export HF_HOME=/mnt/resource_nvme/hf_cache
3. Configure Docker to use the RAID-0
mkdir -p /mnt/resource_nvme/docker
sudo tee /etc/docker/daemon.json > /dev/null <<EOF
{
"data-root": "/mnt/resource_nvme/docker"
}
EOF
sudo chmod 0644 /etc/docker/daemon.json
sudo systemctl restart docker
All of these additional preperation steps can be automated in VM creation using cloud-init. The example cloud-init.yaml file can be used in provisioning the VM as described above.
#cloud-config
package_update: true
write_files:
- path: /opt/setup_nvme.sh
permissions: '0755'
owner: root:root
content: |
#!/bin/bash
NVME_DISKS_NAME=`ls /dev/nvme*n1`
NVME_DISKS=`ls -latr /dev/nvme*n1 | wc -l`
echo "Number of NVMe Disks: $NVME_DISKS"
if [ "$NVME_DISKS" == "0" ]
then
exit 0
else
mkdir -p /mnt/resource_nvme
# Needed incase something did not unmount as expected. This will delete any data that may be left behind
mdadm --stop /dev/md*
mdadm --create /dev/md128 -f --run --level 0 --raid-devices $NVME_DISKS $NVME_DISKS_NAME
mkfs.xfs -f /dev/md128
mount /dev/md128 /mnt/resource_nvme
fi
chmod 1777 /mnt/resource_nvme
- path: /etc/profile.d/hf_home.sh
permissions: '0755'
content: |
export HF_HOME=/mnt/resource_nvme/hf_cache
- path: /etc/docker/daemon.json
permissions: '0644'
content: |
{
"data-root": "/mnt/resource_nvme/docker"
}
runcmd:
- ["/bin/bash", "/opt/setup_nvme.sh"]
- mkdir -p /mnt/resource_nvme/docker
- mkdir -p /mnt/resource_nvme/hf_cache
# PAM group not working for docker group, so this will add all users to docker group
- bash -c 'for USER in $(ls /home); do usermod -aG docker $USER; done'
- systemctl restart docker
Using MI300X
If you are familiar with Nvidia and CUDA tools and environment, AMD provides equivalents as part of the ROCm stack.
MI300X + ROCm |
Nvidia + |
Description |
rocm-smi |
nvidia-smi |
CLI for monitoring the system and making changes |
rccl |
nccl |
Library for communication between GPUs |
Running DeepSeek-R1
1. Pull the container image. It is O(10) GB in size, so it may take a few minutes to download.
docker pull rocm/sglang-staging:20250303
2. Start the SGLang server. The model (~642 GB) is downloaded the first time it is launched and will take at least a few minutes to download. Once the application outputs “The server is fired up and ready to roll!”, you can begin making queries to the model.
docker run \
--device=/dev/kfd \
--device=/dev/dri \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--group-add video \
--privileged \
--shm-size 32g \
--ipc=host \
-p 30000:30000 \
-v /mnt/resource_nvme:/mnt/resource_nvme \
-e HF_HOME=/mnt/resource_nvme/hf_cache \
-e HSA_NO_SCRATCH_RECLAIM=1 \
-e GPU_FORCE_BLIT_COPY_SIZE=64 \
-e DEBUG_HIP_BLOCK_SYN=1024 \
rocm/sglang-staging:20250303 \
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --tp 8 --trust-remote-code --host 0.0.0.0
3. You can now make queries to DeepSeek-R1. For example, these requests to the model from another shell on same host provide model data and will generate a sample response.
curl http://localhost:30000/get_model_info
{"model_path":"deepseek-ai/DeepSeek-R1","tokenizer_path":"deepseek-ai/DeepSeek-R1","is_generation":true}
curl http://localhost:30000/generate -H "Content-Type: application/json" -d '{ "text": "Once upon a time,", "sampling_params": { "max_new_tokens": 16, "temperature": 0.6 } }'
Conclusion
In this post, we detail how to run the full-size 671B DeepSeek-R1 model on a single Azure NDv5 MI300X instance. This includes setting up the machine, installing the necessary drivers, and executing the model. Happy inferencing!
References
Updated Mar 10, 2025
Version 9.0jesselopez
Microsoft
Joined December 01, 2022
Azure High Performance Computing (HPC) Blog
Follow this blog board to get notified when there's new activity