Azure Kubernetes Service (AKS) now offers free platform metrics for monitoring your control plane components. This enhancement provides essential insights into the availability and performance of managed control plane components, such as the API server and etcd. In this blog post, we'll explore these new metrics and demonstrate how to leverage them to ensure the health and performance of your AKS clusters.
What's New?
Previously, detailed control plane metrics were only available through the paid Azure Managed Prometheus feature. Now, these metrics are automatically collected for free for all AKS clusters and are available for creating metric alerts. This democratizes access to critical monitoring data and helps all AKS users maintain more reliable Kubernetes environments.
Available Control Plane Metrics
The following platform metrics are now available for your AKS clusters:
Name | Display Name | Description |
---|---|---|
apiserver_memory_usage_percentage | API Server (PREVIEW) Memory Usage Percentage | Maximum memory percentage (based off current limit) used by API server pod across instances |
apiserver_cpu_usage_percentage | API Server (PREVIEW) CPU Usage Percentage | Maximum CPU percentage (based off current limit) used by API server pod across instances |
etcd_memory_usage_percentage | ETCD (PREVIEW) Memory Usage Percentage | Maximum memory percentage (based off current limit) used by ETCD pod across instances |
etcd_cpu_usage_percentage | ETCD (PREVIEW) CPU Usage Percentage | Maximum ETCD percentage (based off current limit) used by ETCD pod across instances |
etcd_database_usage_percentage | ETCD (PREVIEW) Database Usage Percentage | Maximum utilization of the ETCD database across instances |
Accessing the New Platform Metrics
The metrics are automatically collected and available in the Azure Monitor Metrics explorer. Here's how to access them:
- Navigate to your AKS cluster in the Azure portal
- Select "Metrics" from the monitoring section
- In the Metric namespace dropdown, choose the Metric Namespace as Container Service and Metric as any of the metrics mentioned above e.g. API Server Memory Utilization. You can also choose your desired aggregation (between Avg or Max) and timeframe.
- You'll now see the control plane metrics available for selection:
These metrics can also be retrieved through the platform metrics API or exported to other destinations.
Understanding Key Control Plane Metrics
API Server Memory Usage Percentage
The API server is the front-end for the Kubernetes control plane, processing all requests to the cluster. Monitoring its memory usage is critical because:
- High memory usage can lead to API server instability and potential outages
- Memory pressure may cause request latency or timeouts
- Sustained high memory usage indicates potential scaling issues
A healthy API server typically maintains memory usage below 80%. Values consistently above this threshold warrant investigation and potential remediation. To investigate further into the issue, follow the guide here.
etcd Database Usage Percentage
etcd serves as the persistent storage for all Kubernetes cluster data. The etcd_database_usage_percentage metric is particularly important because:
- etcd performance dramatically degrades as database usage approaches capacity
- High database utilization can lead to increased latency for all cluster operations
- Database size impacts backup and restore operations
Best practices suggest keeping etcd database usage below 2GB (absolute usage) to ensure optimal performance. When usage exceeds this threshold, you can clean up unnecessary resources, reduce watch operations, and implement resource quota and limits. The Diagnose and Solve experience in Azure Portal has detailed insights on the cause of the etcd database saturation. To investigate this issue further, follow the guide here.
Setting Up Alerts for Control Plane Metrics
To proactively monitor your control plane, you can set up metric alerts:
- Navigate to your AKS cluster in the Azure Portal
- Select "Alerts" from the monitoring section
- Click on "Create" and select "Alert Rule"
- Select your subscription, resource group, and resource type "Kubernetes service" in the Scope (selected by default) and click on See all signals in Conditions
- Configure signal logic:
- Select one of the control plane metrics (e.g., "API Server Memory Usage Percentage")
- Set the condition (e.g., "Greater than")
- Define the threshold (e.g., 80%)
- Specify the evaluation frequency and window
- Define actions to take when the alert triggers
- Name and save your alert rule
Example Alert Configurations
API Server Memory Alert:
- Signal: apiserver_memory_usage_percentage
- Operator: Greater than
- Threshold: 80%
- Window: 5 minutes
- Frequency: 1 minute
- Severity: 2 (Warning)
ETCD Database Usage Alert:
- Signal: etcd_database_usage_percentage
- Operator: Greater than
- Threshold: 75%
- Window: 15 minutes
- Frequency: 5 minutes
- Severity: 2 (Warning)
You can also create alerts through CLI, PowerShell or ARM templates
Conclusion
The introduction of free Azure platform metrics for AKS control plane components represents a enhancement to the monitoring capabilities available to all AKS users. By leveraging these metrics, particularly API server memory usage and etcd database usage percentages, you can ensure the reliability and performance of your Kubernetes environments without additional cost.
Start using these metrics today to gain deeper insights into your AKS clusters and set up proactive alerting to prevent potential issues before they impact your applications.
Learn More
For more detailed information, refer to the following documentation:
Updated Mar 11, 2025
Version 1.0aritraghoshmicrosoft
Microsoft
Joined October 20, 2023
Apps on Azure Blog
Follow this blog board to get notified when there's new activity