Blog Post

Apps on Azure Blog
4 MIN READ

Azure Platform Metrics for AKS Control Plane Monitoring

aritraghoshmicrosoft's avatar
Mar 12, 2025

Azure Kubernetes Service (AKS) now offers free platform metrics for monitoring your control plane components. This enhancement provides essential insights into the availability and performance of managed control plane components, such as the API server and etcd. In this blog post, we'll explore these new metrics and demonstrate how to leverage them to ensure the health and performance of your AKS clusters.

What's New?

Previously, detailed control plane metrics were only available through the paid Azure Managed Prometheus feature. Now, these metrics are automatically collected for free for all AKS clusters and are available for creating metric alerts. This democratizes access to critical monitoring data and helps all AKS users maintain more reliable Kubernetes environments.

Available Control Plane Metrics

The following platform metrics are now available for your AKS clusters:

NameDisplay NameDescription
apiserver_memory_usage_percentageAPI Server (PREVIEW) Memory Usage PercentageMaximum memory percentage (based off current limit) used by API server pod across instances
apiserver_cpu_usage_percentageAPI Server (PREVIEW) CPU Usage PercentageMaximum CPU percentage (based off current limit) used by API server pod across instances
etcd_memory_usage_percentageETCD (PREVIEW) Memory Usage PercentageMaximum memory percentage (based off current limit) used by ETCD pod across instances
etcd_cpu_usage_percentageETCD (PREVIEW) CPU Usage PercentageMaximum ETCD percentage (based off current limit) used by ETCD pod across instances
etcd_database_usage_percentageETCD (PREVIEW) Database Usage PercentageMaximum utilization of the ETCD database across instances

Accessing the New Platform Metrics

The metrics are automatically collected and available in the Azure Monitor Metrics explorer. Here's how to access them:

  1. Navigate to your AKS cluster in the Azure portal
  2. Select "Metrics" from the monitoring section

     

  3. In the Metric namespace dropdown, choose the Metric Namespace as Container Service and Metric as any of the metrics mentioned above e.g. API Server Memory Utilization. You can also choose your desired aggregation (between Avg or Max) and timeframe.
  4. You'll now see the control plane metrics available for selection:

     

 

These metrics can also be retrieved through the platform metrics API or exported to other destinations.

Understanding Key Control Plane Metrics

API Server Memory Usage Percentage

The API server is the front-end for the Kubernetes control plane, processing all requests to the cluster. Monitoring its memory usage is critical because:

  • High memory usage can lead to API server instability and potential outages
  • Memory pressure may cause request latency or timeouts
  • Sustained high memory usage indicates potential scaling issues

A healthy API server typically maintains memory usage below 80%. Values consistently above this threshold warrant investigation and potential remediation.  To investigate further into the issue, follow the guide here.

etcd Database Usage Percentage

etcd serves as the persistent storage for all Kubernetes cluster data. The etcd_database_usage_percentage metric is particularly important because:

  • etcd performance dramatically degrades as database usage approaches capacity
  • High database utilization can lead to increased latency for all cluster operations
  • Database size impacts backup and restore operations

Best practices suggest keeping etcd database usage below 2GB (absolute usage) to ensure optimal performance. When usage exceeds this threshold, you can clean up unnecessary resources, reduce watch operations, and implement resource quota and limits. The Diagnose and Solve experience in Azure Portal has detailed insights on the cause of the etcd database saturation. To investigate this issue further, follow the guide here. 

Setting Up Alerts for Control Plane Metrics 

To proactively monitor your control plane, you can set up metric alerts:

  1. Navigate to your AKS cluster in the Azure Portal
  2. Select "Alerts" from the monitoring section

     

  3. Click on "Create" and select "Alert Rule"

     

  4. Select your subscription, resource group, and resource type "Kubernetes service" in the Scope (selected by default) and click on See all signals in Conditions

     

  5. Configure signal logic:
    • Select one of the control plane metrics (e.g., "API Server Memory Usage Percentage")
    • Set the condition (e.g., "Greater than")
    • Define the threshold (e.g., 80%)
    • Specify the evaluation frequency and window

       

  6. Define actions to take when the alert triggers

     

  7. Name and save your alert rule

Example Alert Configurations

API Server Memory Alert:

  • Signal: apiserver_memory_usage_percentage
  • Operator: Greater than
  • Threshold: 80%
  • Window: 5 minutes
  • Frequency: 1 minute
  • Severity: 2 (Warning)

ETCD Database Usage Alert:

  • Signal: etcd_database_usage_percentage
  • Operator: Greater than
  • Threshold: 75%
  • Window: 15 minutes
  • Frequency: 5 minutes
  • Severity: 2 (Warning)

You can also create alerts through CLI, PowerShell or ARM templates 

Conclusion

The introduction of free Azure platform metrics for AKS control plane components represents a  enhancement to the monitoring capabilities available to all AKS users. By leveraging these metrics, particularly API server memory usage and etcd database usage percentages, you can ensure the reliability and performance of your Kubernetes environments without additional cost.

Start using these metrics today to gain deeper insights into your AKS clusters and set up proactive alerting to prevent potential issues before they impact your applications.

Learn More

For more detailed information, refer to the following documentation:

Updated Mar 11, 2025
Version 1.0
No CommentsBe the first to comment