azure kubernetes service
3 TopicsDeploying Azure ND H100 v5 Instances in AKS with NVIDIA MIG GPU Slicing
In this article we will cover: AKS Cluster Deployment (Latest Version) – creating an AKS cluster using the latest Kubernetes version. GPU Node Pool Provisioning – adding an ND H100 v5 node pool on Ubuntu, with --skip-gpu-driver-install to disable automatic driver installation. NVIDIA H100 MIG Slicing Configurations – available MIG partition profiles on the H100 GPU and how to enable them. Workload Recommendations for MIG Profiles – choosing optimal MIG slice sizes for different AI/ML and HPC scenarios. Best Practices for MIG Management and Scheduling – managing MIG in AKS, scheduling pods, and operational tips. AKS Cluster Deployment (Using the Latest Version) Install/Update Azure CLI: Ensure you have Azure CLI 2.0.64+ (or Azure CLI 1.0.0b2 for preview features). This is required for using the --skip-gpu-driver-install option and other latest features. Install the AKS preview extension if needed: az extension add --name aks-preview az extension update --name aks-preview (Preview features are opt-in; using the preview extension gives access to the latest AKS capabilities) Create a Resource Group: If not already done, create an Azure resource group for the cluster: az group create -n MyResourceGroup -l eastus Create the AKS Cluster: Run az aks create to create the AKS control plane. You can start with a default system node pool (e.g. a small VM for system pods) and no GPU nodes yet. For example: az aks create -g MyResourceGroup -n MyAKSCluster \ --node-vm-size Standard_D4s_v5 \ --node-count 1 \ --kubernetes-version <latest-stable-version> \ --enable-addons monitoring This creates a cluster named MyAKSCluster with one standard node. Use the --kubernetes-version flag to specify the latest AKS-supported Kubernetes version (or omit it to get the default latest). As of early 2025, AKS supports Kubernetes 1.27+; using the newest version ensures support for features like MIG and the ND H100 v5 SKU. Retrieve Cluster Credentials: Once created, get your Kubernetes credentials: az aks get-credentials -g MyResourceGroup -n MyAKSCluster Verification: After creation, you should have a running AKS cluster. You can verify the control plane is up with: kubectl get nodes Adding an ND H100 v5 GPU Node Pool (Ubuntu + Skip Driver Install) Next, add a GPU node pool using the ND H100 v5 VM size. The ND H100 v5 series VMs each come with 8× NVIDIA H100 80GB GPUs (640 GB total GPU memory), high-bandwidth interconnects, and 96 vCPUs– ideal for large-scale AI and HPC workloads. We will configure this node pool to run Ubuntu and skip the automatic NVIDIA driver installation, since we plan to manage drivers (and MIG settings) manually or via the NVIDIA operator. Steps to add the GPU node pool: Use Ubuntu Node Image: AKS supports Ubuntu 20.04/22.04 for ND H100 v5 nodes. The default AKS Linux OS (Ubuntu) is suitable. We also set --os-sku Ubuntu to ensure we use Ubuntu (if your cluster’s default is Azure Linux, note that Azure Linux is not currently supported for MIG node pools). Add the GPU Node Pool with Azure CLI: Run: az aks nodepool add \ --cluster-name MyAKSCluster \ --resource-group MyResourceGroup \ --name h100np \ --node-vm-size Standard_ND96isr_H100_v5 \ --node-count 1 \ --node-os-type Linux \ --os-sku Ubuntu \ --skip-gpu-driver-install \ --node-taints sku=gpu:NoSchedule Let’s break down these parameters: --node-vm-size Standard_ND96isr_H100_v5 selects the ND H100 v5 VM size (96 vCPUs, 8×H100 GPUs). Ensure your subscription has quota for this SKU and region. --node-count 1 starts with one GPU VM (scale as needed). --skip-gpu-driver-install (Preview) tells AKS not to pre-install NVIDIA drivers on the node. This prevents the default driver installation, because we plan to handle drivers ourselves (using NVIDIA’s GPU Operator for better control). When using this flag, new GPU nodes come up without NVIDIA drivers until you install them manually or via an operator--node-taints sku=gpu:NoSchedule taints the GPU nodes so that regular pods won’t be scheduled on them accidentally. Only pods with a matching toleration (e.g. labeled for GPU use) can run on these nodes. This is a best practice to reserve expensive GPU nodes for GPU workloads (Optional) You can also add labels if needed. For example, to prepare for MIG configuration with the NVIDIA operator, you might add a label like nvidia.com/mig.config=all-1g.10gb to indicate the desired MIG slicing (explained later). We will address MIG config shortly, so adding such a label now is optional. Wait for Node Pool to be Ready: Monitor the Azure CLI output or use kubectl get nodes until the new node appears. It should register in Kubernetes (in NotReady state initially while it's configuring). Since we skipped driver install, the node will not have GPU scheduling resources yet (no nvidia.com/gpu resource visible) until we complete the next step. Installing the NVIDIA Driver Manually (or via GPU Operator) Because we used --skip-gpu-driver-install, the node will not have the necessary NVIDIA driver or CUDA runtime out of the box. You have two main approaches to install the driver: Use the NVIDIA GPU Operator (Helm-based) to handle driver installation. Install drivers manually (e.g., run a DaemonSet that downloads and installs the .run package or Debian packages). NVIDIA GPU Operator manages drivers, the Kubernetes device plugin, and GPU monitoring components. AKS GPU node pools come with the NVIDIA drivers and container runtime already pre-installed. BUT, because we used the flag : -skip-gpu-driver-install, we can now deploy the NVIDIA GPU Operator to handle GPU workloads and monitoring, while disabling its driver installation (to avoid conflicts with the pre-installed drivers). The GPU Operator will deploy the necessary components like the Kubernetes device plugin and the DCGM exporter for monitoring. 2.1 Installing via NVIDIA GPU Operator Step 1: Add the NVIDIA Helm repository. NVIDIA provides a Helm chart for the GPU Operator. Add the official NVIDIA Helm repo and update it: helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update This repository contains the gpu-operator chart and other NVIDIA helm charts Step 2: Install the GPU Operator via Helm. Use Helm to install the GPU Operator into a dedicated namespace (e.g., gpu-operator). In AKS, disable the GPU Operator’s driver and toolkit deployment (since AKS already has those), and specify the correct container runtime class for NVIDIA. For example: helm install gpu-operator nvidia/gpu-operator \ -n gpu-operator --create-namespace \ --set driver.enabled=false \ --set toolkit.enabled=false \ --set operator.runtimeClass=nvidia-container-runtime In the above command: driver.enabled=false and toolkit.enabled=false prevent the Operator from installing NVIDIA driver containers and container toolkit on AKS nodes (drivers are pre-installed on AKS GPU nodes operator.runtimeClass=nvidia-container-runtime aligns with the runtime class name configured on AKS for GPU support After a few minutes, Helm should report a successful deployment. For example: NAME: gpu-operator LAST DEPLOYED: Fri May 5 15:30:05 2023 NAMESPACE: gpu-operator STATUS: deployed REVISION: 1 TEST SUITE: None You can verify that the GPU Operator’s pods are running in the cluster. The Operator will deploy several DaemonSets including the NVIDIA device plugin, DCGM exporter, and others. For example, after installation you should see pods like the following in the gpu-operator namespace: nvidia-dcgm-exporter-xxxxx 1/1 Running 0 60s nvidia-device-plugin-daemonset-xxxxx 1/1 Running 0 60s nvidia-mig-manager-xxxxx 1/1 Running 0 4m nvidia-driver-daemonset-xxxxx 1/1 Running 0 4m gpu-operator-node-feature-discovery-... 1/1 Running 0 5m ... (other GPU operator pods) ... Here we see the NVIDIA device plugin and NVIDIA DCGM exporter pods running on each GPU node, as well as other components. (Note: In our AKS setup, the nvidia-driver-daemonset may be present but left idle since we disabled driver management.) Step 3: Confirm the operator’s GPU validation. The GPU Operator will run a CUDA validation job to verify everything is working. Check that the CUDA validation pod has completed successfully: kubectl get pods -n gpu-operator -l app=nvidia-cuda-validator Expected output: NAME READY STATUS RESTARTS AGE nvidia-cuda-validator-bpvkt 0/1 Completed 0 3m56s A Completed CUDA validator indicates the GPUs are accessible and the NVIDIA stack is functioning. At this point, you have the NVIDIA GPU Operator (with device plugin and DCGM exporter) installed via Helm on AKS. Verifying MIG on H100 with Node Pool Provisioning Once the driver is installed and the NVIDIA device plugin is running, you can verify MIG. The process is similar to verifying MIG on A100, but the resource naming and GPU partitioning reflect H100 capabilities. Check Node Resources kubectl describe node <h100-node-name> If you chose single MIG strategy, you might see: Allocatable: nvidia.com/gpu: 56 for a node with 8 H100s × 7 MIG slices each = 56. Or: nvidia.com/gpu: 14 if you used MIG2g (which yields 2–3 slices per GPU, depending on the exact profile). If you chose mixed MIG strategy (mig.strategy=mixed), you’ll see something like: Allocatable: nvidia.com/mig-1g.10gb: 56 or whichever MIG slice name is appropriate (e.g., mig-3g.40gb for MIG3g). Confirm MIG in nvidia-smi Run a GPU Workload For instance, run a quick CUDA container: kubectl run mig-test --rm -ti \ --image=nvidia/cuda:12.1.1-runtime-ubuntu22.04 \ --limits="nvidia.com/gpu=1" \ -- bash Inside the container, nvidia-smi should confirm you have a MIG device. Then any CUDA commands (e.g., deviceQuery) should pass, indicating MIG is active and the driver is working. nvidia-smi -L MIG Management on H100 The H100 supports several MIG profiles – predefined ways to slice the GPU. Each profile is denoted by <N>g.<M>gb meaning it uses N GPU compute slices (out of 7) and M GB of memory. Key H100 80GB MIG profiles include: MIG 1g.10gb: Each instance has 1/7 of the SMs and 10 GB memory (1/8 of VRAM). This yields 7 instances per GPU (7 × 10 GB = 70 GB out of 80, a small portion is reserved). This is the smallest slice size and maximizes the number of instances (useful for many lightweight tasks). MIG 1g.20gb: Each instance has 1/7 of SMs but 20 GB memory (1/4 of VRAM), allowing up to 4 instances per GPU. This profile gives each instance more memory while still only a single compute slice – useful for memory-intensive workloads that don’t need much compute. MIG 2g.20gb: Each instance gets 2/7 of SMs and 20 GB memory (2/8 of VRAM). 3 instances can run on one GPU. This offers a balance: more compute per instance than 1g, with a moderate 20 GB memory each. MIG 3g.40gb: Each instance has 3/7 of SMs and 40 GB memory (half the VRAM). Two instances fit on one H100. This effectively splits the GPU in half. MIG 4g.40gb: Each instance uses 4/7 of SMs and 40 GB memory. Only one such instance can exist per GPU (because it uses half the memory and more than half of the SMs). In practice, a 4g.40gb profile might be combined with a smaller profile on the same GPU (e.g., a 4g.40gb + a 3g.40gb could occupy one GPU, totaling 7/7 SM and 80GB). However, AKS node pools use a single uniform profile per GPU, so you typically wouldn’t mix profiles on the same GPU in AKS. MIG 7g.80gb: This profile uses the entire GPU (all 7/7 SMs and 80 GB memory). Essentially, MIG 7g.80gb is the full GPU as one instance (no slicing). It’s equivalent to not using MIG at all for that GPU. These profiles illustrate the flexibility: you can trade off number of instances vs. the power of each instance. For example, MIG 1g.10gb gives you seven small GPUs, whereas MIG 3g.40gb gives you two much larger slices (each roughly half of an H100). All MIG instances are hardware-isolated, meaning each instance’s performance is independent (one instance can’t starve others of GPU resources) Enabling MIG in AKS: There are two main ways to configure MIG on the AKS node pool: At Node Pool Creation (Static MIG Profile): Azure allows specifying a GPU instance profile when creating the node pool. For example, adding --gpu-instance-profile MIG1g to the az aks nodepool add command would provision each H100 GPU in 1g mode (e.g., 7×10GB instances per GPU). Supported profile names for H100 include MIG1g, MIG2g, MIG3g, MIG4g, and MIG7g (the same profile names used for A100, but on H100 they correspond to the sizes above). Important: Once set, the MIG profile on a node pool cannot be changed without recreating the node pool. If you chose MIG1g, all GPUs in that node pool will be partitioned into 7 slices each, and you can’t later switch those nodes to a different profile on the fly. Dynamically via NVIDIA GPU Operator: If you skipped the driver install (as we did) and are using the GPU Operator, you can let the operator manage MIG. This involves labeling the node with a desired MIG layout. For example, nvidia.com/mig.config=all-1g.10gb means “partition all GPUs into 1g.10gb slices.” The operator’s MIG Manager will then enable MIG mode on the GPUs, create the specified MIG instances, and mark the node ready when done. This approach offers flexibility – you could theoretically adjust the MIG profile by changing the label and letting the operator reconfigure (though it will drain and reboot the node to apply changes). The operator adds a taint like mig-nvidia.io/device-config=pending (or similar) during reconfiguration to prevent scheduling pods too early For our deployment, we opted to skip Azure’s automatic MIG config and use the NVIDIA operator. If you followed the steps in section 2 and set the nvidia.com/mig.config label before node creation, the node on first boot will come up, install drivers, then partition into the specified MIG profile. If not, you can label the node now and the operator will configure MIG accordingly. For example: kubectl label node <node-name> nvidia.com/mig.config=all-3g.40gb --overwrite to split each GPU into two 3g.40gb instances. The operator will detect this and partition the GPUs (the node may briefly go NotReady while MIG is being set up). After MIG is configured, verify the node’s GPU resources again. Depending on the MIG strategy (see next section), you will either see a larger number of generic nvidia.com/gpu resources or specifically named resources like nvidia.com/mig-3g.40gb. We will discuss how to schedule workloads onto these MIG instances next. Important Considerations: Workload Interruption: Applying a new MIG configuration can disrupt running GPU workloads. It's advisable to drain the node or ensure that no critical workloads are running during the reconfiguration process. Node Reboot: Depending on the environment and GPU model, enabling or modifying MIG configurations might require a node reboot. Ensure that your system is prepared for potential reboots to prevent unexpected downtime. Workload Recommendations for MIG Profiles (AI/ML vs. HPC) Different MIG slicing configurations are suited to different types of workloads. Here are recommendations for AI/ML and HPC scenarios: Full GPU (MIG 7g.80gb or MIG disabled) – Best for the largest and most intensive tasks. If you are training large deep learning models (e.g. GPT-style models, complex computer vision training) or running HPC simulations that fully utilize a GPU, you should use the entire H100 GPU. The ND H100 v5 is designed to excel at these demanding workloads. In Kubernetes, you would simply schedule pods that request a whole GPU. (If MIG mode is enabled with 7g.80gb profile, each GPU is one resource unit.) This ensures maximum performance for jobs that can utilize 80 GB of GPU memory and all compute units. HPC workloads like physics simulations, CFD, weather modeling, etc., typically fall here – they are optimized to use full GPUs or even multiple GPUs in parallel, so slicing a GPU could impede their performance unless you explicitly want to run multiple smaller HPC jobs on one card. Large MIG Partitions (3g.40gb or 4g.40gb) – Good for moderately large models or jobs that don’t quite need a full H100. For instance, you can split an H100 into 2× 3g.40gb instances, each with 40 GB VRAM and ~43% of the H100’s compute. This configuration is popular for AI model serving and inference where a full H100 might be underutilized. In fact, it might happen that two MIG 3g.40gb instances on an H100 can serve models with performance equal or better than two full A100 GPUs, at a lower cost. Each 3g.40gb slice is roughly equivalent to an A100 40GB in capability, and also unlocks H100-specific features (like FP8 precision for inference). Use cases: Serving two large ML models concurrently (each model up to 40GB in size, such as certain GPT-XXL or vision models). Each model gets a dedicated MIG slice. Running two medium-sized training jobs on one physical GPU. For example, two separate experiments that each need ~40GB GPU memory can run in parallel, each on a MIG 3g.40gb. This can increase throughput for hyperparameter tuning or multi-user environments. HPC batch jobs: if you have HPC tasks that can fit in half a GPU (perhaps memory-bound tasks or jobs that only need ~50% of the GPU’s FLOPs), using two 3g.40gb instances allows two jobs to run on one GPU server concurrently with minimal interference. MIG 4g.40gb (one 40GB instance using ~57% of compute) is a less common choice by itself – since only one 4g instance can exist per GPU, it leaves some GPU capacity unused (the remaining 3/7 SMs would be idle). It might be used in a mixed profile scenario (one 4g + one 3g on the same GPU) if manually configured. In AKS (which uses uniform profiles per node pool), you’d typically prefer 3g.40gb if you want two equal halves, or just use full GPUs. So in practice, stick with 3g.40gb for a clean 2-way split on H100. Medium MIG Partitions (2g.20gb) – Good for multiple medium workloads. This profile yields 3 instances per GPU, each with 20 GB memory and about 28.6% of the compute. This is useful when you have several smaller training jobs or medium-sized inference tasks that run concurrently. Examples: Serving three different ML models (each ~15–20 GB in size) from one H100 node, each model on its own MIG 2g.20gb instance. Running 3 parallel training jobs for smaller models or prototyping (each job can use 20GB GPU memory). For instance, three data scientists can share one H100 GPU server, each getting what is effectively a “20GB GPU”. Each 2g.20gb MIG slice should outperform a V100 (16 GB) in both memory and compute, so this is still a hefty slice for many models. In HPC context, if you had many lighter GPU-accelerated tasks (for example, three independent tasks that each use ~1/3 of a GPU), this profile could allow them to share a node efficiently. Small MIG Partitions (1g.10gb) – Ideal for high-density inference and lightweight workloads. This profile creates 7 instances per GPU, each with 10 GB VRAM and 1/7 of the compute. It’s perfect for AI inference microservices, model ensembles, or multi-tenant GPU environments: Deploying many small models or instances of a model. For example, you could host seven different AI services (each requiring <10GB GPU memory) on one physical H100, each in its own isolated MIG slice. Most cloud providers use this to offer “fractional GPUs” to customers– e.g., a user could rent a 1g.10gb slice instead of the whole GPU. Running interactive workloads like Jupyter notebooks or development environments for multiple users on one GPU server. Each user can be assigned a MIG 1g.10gb slice for testing small-scale models or doing data science workloads, without affecting others. Inference tasks that are memory-light but require GPU acceleration – e.g., running many inference requests in parallel across MIG slices (each slice still has ample compute for model scoring tasks, and 10 GB is enough for many models like smaller CNNs or transformers). Keep in mind that 1g.10gb slices have the lowest compute per instance, so they are suited for workloads that individually don’t need the full throughput of an H100. They shine when throughput is achieved by running many in parallel. 1g.20gb profile – This one is a bit niche: 4 slices per GPU, each with 20 GB but only 1/7 of the SMs. You might use this if each task needs a large model (20 GB) but isn’t compute-intensive. An example could be running four instances of a large language model in inference mode, where each instance is constrained by memory (loading a 15-18GB model) but you deliberately limit its compute share to run more concurrently. In practice, the 2g.20gb profile (which gives the same memory per instance and more compute) might be preferable if you can utilize the extra SMs. So 1g.20gb would only make sense if you truly have compute-light, memory-heavy workloads or if you need exactly four isolated instances on one GPU. HPC Workloads Consideration: Traditional HPC jobs (MPI applications, scientific computing) typically either use an entire GPU or none. MIG can be useful in HPC for capacity planning – e.g., running multiple smaller GPU-accelerated jobs simultaneously if they don’t all require a full H100. But it introduces complexity, as the HPC scheduler must be aware of fractional GPUs. Many HPC scenarios might instead use whole GPUs per job for simplicity. That said, for HPC inference or analytics (like running multiple inference tasks on simulation output), MIG slicing can improve utilization. If jobs are latency-sensitive, MIG’s isolation ensures one job doesn’t impact another, which is beneficial for multi-tenant HPC clusters (for example, different teams sharing a GPU node). In summary, choose the smallest MIG slice that still meets your workload’s requirements. This maximizes overall GPU utilization and cost-efficiency by packing more tasks on the hardware. Use larger slices or full GPUs only when a job truly needs the extra memory and compute. It’s often a good strategy to create multiple GPU node pools with different MIG profiles tailored to different workload types (e.g., one pool of full GPUs for training and one pool of 1g or 2g MIG GPUs for inference). Appendix A: MIG Management via AKS Node Pool Provisioning (without GPU Operator MIG profiles) Multi-Instance GPU (MIG) allows partitioning an NVIDIA A100 (and newer) GPU into multiple instances. AKS supports MIG for compatible GPU VM sizes (such as the ND A100 v4 series), but MIG must be configured when provisioning the node pool – it cannot be changed on the fly in AKS. In this section, we show how to create a MIG-enabled node pool and integrate it with Kubernetes scheduling. We will not use the GPU Operator’s dynamic MIG reconfiguration; instead, we set MIG at node pool creation time (which is the only option on AKS). Step 1: Provision an AKS node pool with a MIG profile. Choose a MIG-capable VM size (for example, Each instance has 1/7 of the SMs and 10 GB memory (1/8 of VRAM). This yields 7 instances per GPU (7 × 10 GB = 70 GB out of 80, a small portion is reserved). Use the Azure CLI to create a new node pool and specify the --gpu-instance-profile: az aks nodepool add \ --resource-group <myResourceGroup> \ --cluster-name <myAKSCluster> \ --name migpool \ --node-vm-size Standard_ND96isr_H100_v5 \\ --node-count 1 \ --gpu-instance-profile MIG1g In this example, we create a node pool named "migpool" with MIG profile MIG1g (each physical H100 GPU is split into 7 instances of 1g/5gb each). Important: You cannot change the MIG profile after the node pool is created. If you need a different MIG configuration (e.g., 2g.10gb or 4g.20gb instances), you must create a new node pool with the desired profile. Note: MIG is only supported on Ubuntu-based AKS node pools (not on Azure Linux nodes), and currently the AKS cluster autoscaler does not support scaling MIG-enabled node pools. Plan capacity accordingly since MIG node pools can’t auto-scale. Appendix B: Key Points and Best Practices No On-the-Fly Profile Changes With AKS, once a node pool is created with --gpu-instance-profile MIGxg, you cannot switch to a different MIG layout on that same node pool. If you need a new MIG profile, create a new node pool. --skip-gpu-driver-install This is typically used if you need a specific driver version, or if you want the GPU Operator to manage drivers (instead of the in-box AKS driver). Make sure your driver is installed before you schedule GPU workloads. If the driver is missing, pods that request GPU resources will fail to initialize. Driver Versions for H100 H100 requires driver branch R525 or newer (and CUDA 12+). Verify the GPU Operator or your manual install uses a driver that supports H100 and MIG on H100 specifically. Single vs. Mixed Strategy Single strategy lumps all MIG slices together as nvidia.com/gpu. This is simpler for uniform MIG node pools. Mixed strategy exposes resources like nvidia.com/mig-1g.10gb. Use if you need explicit scheduling by MIG slice type. Configure this in the GPU Operator’s Helm values (e.g., --set mig.strategy=single or mixed). If the Operator’s MIG Manager is disabled, it won’t attempt to reconfigure MIG, but it will still let the device plugin report the slices in single or mixed mode. Resource Requests and Scheduling If using single strategy, a pod that requests nvidia.com/gpu: 1 will be allocated a single 1g.10gb MIG slice on H100. If using mixed, that same request must specifically match the MIG resource name (e.g., nvidia.com/mig-1g.10gb: 1). If your pod requests nvidia.com/gpu: 1, but the node only advertises nvidia.com/mig-1g.10gb, scheduling won’t match. So be consistent in your pod specs. Cluster Autoscaler Currently, MIG-enabled node pools have limited or no autoscaler support on AKS (the cluster autoscaler does not fully account for MIG resources). Scale these node pools manually or via custom logic. If you rely heavily on auto-scaling, consider using a standard GPU node pool (no MIG) or carefully plan capacity to avoid needing dynamic scaling for MIG pools. Monitoring The GPU Operator deploys DCGM exporter by default, which can collect MIG-specific metrics. Integrate with Prometheus + Grafana for GPU usage dashboards. MIG slices are typically identified by unique device IDs in DCGM. You can see which MIG slices are busier than others, memory usage, etc. Node Image Upgrades Because you’re skipping the driver install from AKS, ensure you keep your GPU driver DaemonSet or Operator up to date. If you do a node image upgrade (AKS version upgrade), the OS might change, requiring a recompile or a matching driver version. The GPU Operator normally handles this seamlessly by re-installing the driver on the new node image. Test your upgrades in a staging cluster if possible, especially with new AKS releases or driver versions. Handling Multiple Node Pools Many users create one node pool with full GPUs (no MIG) for large jobs, and another MIG-enabled node pool for smaller parallel workloads. You can do so easily by repeating the steps above for each node pool, specifying different MIG profiles. References MIG User Guide NVIDIA GPU Operator with Azure Kubernetes Service ND-H100-v5 sizes series Create a multi-instance GPU node pool in Azure Kubernetes Service (AKS)570Views2likes0CommentsSix reasons why startups and at-scale cloud native companies build their GenAI Apps with Azure
Azure has evolved as a platform of choice for many startups including Perplexity and Moveworks, as well as at-scale companies today. Here are six reasons why we see companies of all sizes building their GenAI apps on Azure OpenAI Service.3.1KViews2likes0Comments