Azure Managed Grafana
17 TopicsAzure Managed Grafana Brings Grafana 11 and More
We’re thrilled to announce the public preview of Grafana 11 and several feature enhancements in Azure Managed Grafana based on your feedback. We continue to evolve our service to deliver what matters most to our customers. Grafana 11 This annual major update to Grafana includes new functionality and improvements across dashboards, panels, queries, and alerts. The current preview in Managed Grafana offers Grafana v11.2. It includes the following key features: Explore Metrics Scenes powered dashboards Subfolders Numerous improvements to canvas visualization and alerting For more information on Grafana 11, please refer What’s new in Grafana v11.0, v11.1, and v11.2 and consider how the breaking changes may impact your specific use cases. You’ll need to create a new Managed Grafana instance to use Grafana 11 preview. Upgrading from Grafana 10 directly isn’t supported yet. You can copy over dashboards from your current Managed Grafana instance by following the steps in Migrate to Azure Managed Grafana. Please note that not all Grafana 11 features are available in Managed Grafana at present; if applicable, more features will be added over time. Azure Monitor Updates for Grafana 11 Improved Azure Monitor Logs visualizations This update extends Azure Monitor logs visualizations to support Basic Logs. This enables you to view Azure Monitor Log tables that have been configured with the lower cost Basic Log tier in Explore and dashboard panels. Additionally, Azure Monitor Logs details can now be viewed in Grafana Explore and Logs panels. You can filter query results by column values, run ad-hoc statistics and choose which column to display using simple point and click interaction without needing to modify the query text. Explore views also include options to view JSON data in dynamic columns. Azure Kubernetes Service users can leverage these views in a new Container Log dashboard. Prometheus Exemplars support for Azure Monitor Application Insight traces You can now drill down from Prometheus exemplars to Application Insights traces in Grafana. Using Exemplars in your troubleshooting workflow improves triage and analysis response times by allowing you to navigate from metrics to sample traces related to errors and exceptions and easily compare performance of transactions. To take advantage of this capability, the application needs to be instrumented to emit Prometheus metrics with Exemplars and traces to Azure Monitor Application Insights. Sign up for the Private Preview of Exemplars support in your Azure Monitor Workspace. User-Assigned Managed Identity Since its inception, Managed Grafana sets up a system-assigned managed identity for a new Grafana workspace by default. You can use this managed identity as the security principal to access backend data sources connected to your workspace. While it’s convenient to use, system-assigned managed identity isn’t always suitable. Enterprise customers who have stricter identity management policies typically create and manage all Entra ID identities by themselves. Managed Grafana now allows these customers to use identities defined in their Entra ID tenants instead. With the user-assigned managed identity feature, you can select an existing Entra ID identity to be used for authentication and authorization with your data sources. Please note that you can choose only one type of managed identity for each workspace. You can’t enable both system-assigned and user-assigned managed identities simultaneously. Grafana Settings Grafana server settings allow you to customize specific server behaviors. Managed Grafana configures and manages these settings automatically, so you don’t have to deal with them. There are some settings where usage varies from user to user. Managed Grafana now gives you the option to change their default values. The currently supported ones are: viewers_can_edit – determines whether users with the Grafana Viewer role can edit dashboards external_enabled – controls the public sharing of snapshots Grafana Migration Tool If you have a self-hosted Grafana server on-premises or in the cloud that you’d like to migrate to Managed Grafana, you can perform this operation with one command in the Azure CLI. The new az grafana migrate command automates the process of copying your existing dashboards from any Grafana server to your Managed Grafana workspace. It supports several options that control how the content migration should be conducted as well as a dry-run option for you to test and see the migration results before committing to the operation. Let Us Know How We’re Doing If you’re a current user of Managed Grafana, we’d love to hear from you. Please take a moment and fill out this online survey. It will help us further improve our service to better serve you. Thank you!788Views2likes2CommentsGeneral Availability: Kubernetes Metadata and Logs Filtering in Azure Monitor-Container Insights
Today at Ignite, we are thrilled to announce the General Availability of Kubernetes Metadata and Logs Filtering in Azure Monitor – Container Insights! This enhancement brings additional Kubernetes metadata to the ContainerLogsV2 schema, including PodLabels, PodAnnotations, PodUid, Image, ImageID, ImageRepo, and ImageTag. Moreover, the new Logs Filtering feature allows for precise filtering of both workload and system pods/containers. These advancements not only provide users with richer context and enhanced visibility into their workloads but are crucial for customer troubleshooting as they provide deeper insights into the Kubernetes environment. Key Features Enhanced ContainerLogV2 schema with Kubernetes Metadata Fields: Detailed metadata fields enhance log analysis. These include “podLabels,” “podAnnotations,” “podUid,” “image,” “imageID,” “imageRepo,” and “imageTag.” Customized Include List Configuration: Users can tailor metadata fields via ConfigMap. All fields are collected by default. Enhanced ContainerLogV2 schema with Log Level: Assess application health with color-coded severity levels (e.g., CRITICAL, ERROR, WARNING). Helps incident response and proactive monitoring. Annotation Based Log Filtering for workloads: Efficient log filtering through podAnnotations. Focus on relevant information, optimizing costs and resource usage. ConfigMap Based Log Filtering for platform logs (System Kubernetes Namespaces): Enables ability to configure log collection of specific pods within the system namespaces through ConfigMap. Grafana Dashboard for Visualization: Leverage the power of Grafana Dashboard to visualize log levels, log volume, rate, records, and more. Empowers in-depth analysis and real-time monitoring. To learn more and enable this new feature, please visit our Kubernetes Metadata and Logs Filtering Documentation. If you have any questions or feedback on Kubernetes Logs Metadata and Filtering, please reach out to ibraraslam@microsoft.com or fill out this survey!344Views0likes0CommentsMonitoring GPU Metrics in AKS with Azure Managed Prometheus, DCGM Exporter and Managed Grafana
Azure Monitor managed service for Prometheus provides a production-grade solution for monitoring without the hassle of installation and maintenance. By leveraging these managed services, we can focus on extracting insights from your metrics and logs rather than managing the underlying infrastructure. The integration of essential GPU metrics—such as Framebuffer Memory Usage, GPU Utilization, Tensor Core Utilization, and SM Clock Frequencies—into Azure Managed Prometheus and Grafana enhances the visualization of actionable insights. This integration facilitates a comprehensive understanding of GPU consumption patterns, enabling more informed decisions regarding optimization and resource allocation. Azure Managed Prometheus recently announced general availability of Operator and CRD support, which will enable customers to customize metrics collection and add scraping of metrics from workloads and applications using Service and Pod Monitors, similar to the OSS Prometheus Operator. This blog will demonstrate how we leveraged the CRD/Operator support in Azure Managed Prometheus and used the Nvidia DCGM Exporter and Grafana to enable GPU monitoring. GPU monitoring As the use of GPUs has skyrocketed for deploying large language models (LLMs) for both inference and fine-tuning, monitoring these resources becomes critical to ensure optimal performance and utilization. Prometheus, an open-source monitoring and alerting toolkit, coupled with Grafana, a powerful dashboarding and visualization tool, provides an excellent solution for collecting, visualizing, and acting on these metrics. Essential metrics such as Framebuffer Memory Usage, GPU Utilization, Tensor Core Utilization, and SM Clock Frequencies serve as fundamental indicators of GPU consumption, offering invaluable insights into the performance and efficiency of graphics processing units, and thereby enabling us to reduce our COGs and improve operations. Using Nvidia’s DGCM Exporter with Azure Managed Prometheus The DGCM Exporter is a tool developed by Nvidia to collect and export GPU metrics. It runs as a pod on Kubernetes clusters and gathers various metrics from Nvidia GPUs, such as utilization, memory usage, temperature, and power consumption. These metrics are crucial for monitoring and managing the performance of GPUs. You can integrate this exporter with Azure Managed Prometheus. The section below in blog describes the steps and changes needed to deploy the DCGM Exporter successfully. Prerequisites Before we jump straight to the installation, ensure your AKS cluster meets the following requirements: GPU Node Pool: Add a node pool with the required VM SKU that includes GPU support. GPU Driver: Ensure the NVIDIA Kubernetes device plugin driver is running as a DaemonSet on your GPU nodes. Enable Azure Managed Prometheus and Azure Managed Grafana on your AKS cluster. Refactoring Nvidia DCGM Exporter for AKS: Code Changes and Deployment Guide Updating API Versions and Configurations for Seamless Integration As per the official documentation, the best way to get started with DGCM Exporter is to install it using Helm. When installing over AKS with Managed Prometheus, you might encounter the below error: Error: Installation Failed: Unable to build Kubernetes objects from release manifest: resource mapping not found for name: "dcgm-exporter-xxxxx" namespace: "default" from "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1". Ensure CRDs are installed first. To resolve this, follow these steps to make necessary changes in the DCGM code: Clone the Project: Go to the GitHub repository of the DCGM Exporter and clone the project or download it to your local machine. Navigate to the Template Folder: The code used to deploy the DCGM Exporter is located in the template folder within the deployment folder. Modify the service-monitor.yaml File: Find the file service-monitor.yaml. The apiVersion key in this file needs to be updated from monitoring.coreos.com/v1 to azmonitoring.coreos.com/v1. This change allows the DCGM Exporter to use the Azure managed Prometheus CRD. apiVersion: azmonitoring.coreos.com/v1 4. Handle Node Selectors and Tolerations: GPU node pools often have tolerations and node selector tags. Modify the values.yaml file in the deployment folder to handle these configurations: nodeSelector: accelerator: nvidia tolerations: - key: "sku" operator: "Equal" value: "gpu" effect: "NoSchedule" Helm: Packaging, Pushing, and Installation on Azure Container Registry We followed the MS Learn documentation for pushing and installing the package through Helm on Azure Container Registry. For a comprehensive understanding, you can refer to the documentation. Here are the quick steps for installation: After making all the necessary changes in the deployment folder on the source code, be on that directory to package the code. Log in to your registry to proceed further. 1. Package the Helm chart and login to your container registry: helm package . helm registry login <container-registry-url> --username $USER_NAME --password $PASSWORD 2. Push the Helm Chart to the Registry: helm push dcgm-exporter-3.4.2.tgz oci://<container-registry-url>/helm 3. Verify that the package has been pushed to the registry on Azure portal. 4. Install the chart and verify the installation: helm install dcgm-nvidia oci://<container-registry-url>/helm/dcgm-exporter -n gpu-resources #Check the installation on your AKS cluster by running: helm list -n gpu-resources #Verify the DGCM Exporter: Kubectl get po -n gpu-resources Kubectl get ds -n gpu-resources You can now check that the DGCM Exporter is running on the GPU nodes as a DaemonSet. Exporting GPU Metrics and Configuring Azure Managed Grafana Dashboard Once the DGCM Exporter DaemonSet is running across all GPU node pools, you need to export the GPU metrics generated by this workload to Azure Managed Prometheus. This is accomplished by deploying a PodMonitor resource. Follow these steps: Deploy the PodMonitor: Apply the following YAML configuration to deploy the PodMonitor: apiVersion: azmonitoring.coreos.com/v1 kind: PodMonitor metadata: name: nvidia-dcgm-exporter labels: app.kubernetes.io/name: nvidia-dcgm-exporter spec: selector: matchLabels: app.kubernetes.io/name: nvidia-dcgm-exporter podMetricsEndpoints: - port: metrics interval: 30s podTargetLabels: 2. Check if the PodMonitor is deployed and running by executing: kubectl get podmonitor -n <namespace> 3. Verify Metrics export: Ensure that the metrics are being exported to Azure Managed Prometheus on the portal by navigating to the "Metrics" page on your Azure Monitor Workspace. Create the DGCM Dashboard on Azure Managed Grafana The GitHub repository for the DGCM Exporter includes a JSON file for the Grafana dashboard. Follow the MS Learn documentation to import this JSON into your Managed Grafana instance. After importing the JSON, the dashboard displaying GPU metrics will be visible on Grafana.3.5KViews0likes0CommentsAzure Monitor cost optimization using Azure Advisor
Azure Advisor is a free offering that can help you avoid problems and save money by providing you with proactive best practice guidance. We in Azure Monitor are committed to assisting you in optimizing your budget allocation, making informed decisions about monitoring options, and discovering features and configurations that enable you to get more out of their infrastructure.3.2KViews0likes0CommentsIntroducing Query editor: Empowering Users with PromQL in Azure Monitor Metrics!
We're thrilled to announce the public preview launch of Query Editor in Azure Monitor Metrics, an advanced feature that allows customers to query and write PromQL queries directly within their Azure Monitor workspace (AMW). This long-awaited addition comes as a direct response to the growing demand from our customers, and we're excited to finally deliver this capability to you. What’s new? Unlocking the Power of PromQL: Prometheus Query Language (PromQL) has emerged as a standard in the realm of monitoring and observability, offering users flexibility and expressiveness in querying metric data. With the Query Editor in Azure Monitor Metrics, users can now harness the full potential of PromQL to derive actionable insights for their resources. Previously, users in the Azure portal were unable to query their Prometheus metrics on AKS or Arc-enabled clusters sent to Azure Monitor Workspace via Azure Managed Prometheus in the portal. With this new capability, users can now query Prometheus metrics for their AKS resource or Arc-enabled clusters directly in the Query editor within the portal. Seamless Querying Experience: With the Query Editor, users can compose and execute PromQL queries directly within their Azure Monitor workspace that they are emitting metrics to. This streamlines the monitoring workflow, enabling users to stay focused and productive without the hassle of context switching while querying different types of metric data. Benefits of Query editor with PromQL: Rich Query Language: PromQL offers a rich set of functions and operators for querying metric data, allowing users to perform complex aggregations, transformations, and calculations with ease. Familiarity and Interoperability: For users familiar with Prometheus-based monitoring solutions, the Query Editor provides a familiar environment for querying Azure metrics, facilitating a smoother transition and interoperability between platforms. How it works? Using the Query Editor is simple. Just navigate to your Azure Monitor workspace (AMW), select the Azure Monitor Metrics Query Editor, and start writing your PromQL queries. Get Started Today: The public preview of Query Editor in Azure Monitor Metrics is now available, and we invite you to try it out and share your feedback with us. Your input is invaluable as we continue to refine and improve this feature to better serve your monitoring and analytics needs. Please note, currently, the Query editor only supports querying metrics stored in an Azure Monitor Workspace. We plan to offer future support for platform metrics. https://aka.ms/queryEditorPreview https://learn.microsoft.com/en-Us/azure/azure-monitor/essentials/azure-monitor-workspace-overview?tabs=azure-portal Stay tuned for more updates and enhancements as we work towards delivering even more value to our valued Azure customers.3.8KViews3likes1CommentAzure Managed Grafana Adds New SKU and Features
At the first anniversary of Azure Managed Grafana general availability, we’re excited to announce the preview of the Essential SKU as well as numerous new functionalities and enhancements. These additions bring more choices and value to our users. Essential SKU We’ve observed that many current users of Managed Grafana exclusively or primarily view data from Azure Monitor services, including the managed service for Prometheus. These users don’t require the full extensibility Grafana provides. We’re introducing a simplified Managed Grafana SKU called Essential that focuses on Azure Monitor dashboarding. Because it consumes less resources, the Essential SKU is more cost-effective than the Standard SKU for development and testing uses. You pay only for active usage based on the number of actual users you have in each calendar month. You can have one Essential SKU workspace per Azure subscription. To create an Essential SKU workspace, choose “Essential” in the Pricing Plans during a new deployment. You may also upgrade an Essential SKU workspace to Standard at a later point, as your needs for Grafana functionality change over time. Grafana Version 10 Users creating new deployments of Azure Managed Grafana Standard SKU can now choose Grafana version 10 for access to a wide range of features and enhancements including time region support and annotation filtering improvements. With this opt in model, you have the flexibility to use a newer Grafana version much earlier than when it becomes mandatory for all Managed Grafana instances. Existing Managed Grafana workspaces running on Grafana version 9 can’t be upgraded in-place to version 10 yet. We’ll announce when it’s supported, once we’ve resolved all the migration issues. Team Sync with Entra Groups Grafana team sync is a Grafana Pro feature that lets you connect teams in Grafana to an external identity provider. You typically set up Grafana teams to manage permissions for accessing Grafana dashboards at a better scale. With regular Grafana teams however, you’ll have to configure and maintain their member lists as a separate task. Managed Grafana utilizes team sync to allow you to define Grafana teams as Entra (formerly, Azure Active Directory) groups. With such functionality, you can manage your groups and memberships centrally and with greater consistency. Plugin Management Until very recently, Managed Grafana packaged and executed all plugins as a part of the Grafana software image that’s deployed to all workspace instances. That approach impacted both the image size and startup time as the number of plugins increases. With plugin management, Managed Grafana pre-installs a smaller set of plugins, and you can add optional plugins to your Grafana workspace after it’s created. Initially only plugins that have been reviewed by the Managed Grafana team are available for installation. Application Insights Traces in Grafana Starting with Grafana 10, Azure Monitor users can now accelerate their application performance triage and troubleshooting workflows by searching, filtering and viewing Application Insights traces natively in Grafana’s trace visualizations and Tracing in Explore. Any application being monitored by Azure Monitor Application Insights stores its logs and traces in an Application Insights resource. Prior to Grafana 10, these application traces could only be queried and searched for as logs, and viewed in Grafana’s standard timeseries and table visualization panels. Search for traces in Azure Monitor query wizard In Grafana 10, users can now filter and isolate traces of interest with the new trace query wizard in Explore or dashboard panels. View Traces After identifying an anomalous trace, users can choose to view trace details directly in Grafana using the split view in Explore or by opening a new tab. Click on the itemId field to view all the spans within the end-to-end trace. View Trace details The selected trace or span opens in Grafana’s dedicated trace viewer where you can use the Gantt chart view to understand the contribution of delays by individual microservices and dependencies. This is a fast and efficient approach to performing root cause of analysis to identify why a specific transaction was slow. Start with new out-of-the-box dashboards To help you get started working with App Insights traces in Grafana, there are now 5 new dashboards that are shipped with the Azure Monitor data source plugin and available for import from Dashboards | Grafana Labs. Each of these dashboards provides a curated experience to triage performance problems or failures and allows users to click on a single operationID to view the trace details. Azure / Insights / Applications - Performance - 1. Operations | Grafana Labs Azure / Insights / Applications - Performance - 2. Dependencies | Grafana Labs Azure / Insights / Applications - Failures - 1. Operations | Grafana Labs Azure / Insights / Applications - Failures - 2. Dependencies | Grafana Labs Azure / Insights / Applications - Failures - 3. Exceptions | Grafana Labs User-Based Authentication for Azure Data Explorer (ADX) Grafana normally uses a special identity (e.g., managed identity or app registration) as the access credential for a data source. The ADX plugin is the first data source plugin to support user-based authentication and to enable end-to-end permission control for the currently signed in user. Most ADX customers have already configured their clusters for user access. The user-based authentication for ADX allows those same settings to be applied to Grafana access to ADX. It not only simplifies the data source access configuration but also makes possible to apply policies (e.g., audit logging) based on who the user is. It’s worth noting that user-based authentication doesn’t work with Grafana automation features, such as alerts or reporting. If you need these features, you’ll have to set up a separate ADX data source with a managed identity or app registration and use that with the features. User Deduplication in Billing One customer pain-point we’ve heard is that Managed Grafana charges for each active user in each workspace. If a user interacts with two workspaces, that user is billed twice. The Managed Grafana service are region based. There is no easy way for it to enumerate all workspaces belonging to a customer. It, however, can do this reliably at the Azure subscription level. Starting with this December’s billing cycle, Managed Grafana will charge each user once based on the first workspace in a subscription they sign into. A user can access any number of Standard SKU workspaces within the same subscription, and they will be counted as one active user for billing purposes. Try Them Out Today With these updates, along with private link and managed private endpoint that we released earlier this year, we hope that Azure Managed Grafana will better meet your visualization needs. Give them a try and leave us a comment to let us know what you think.4.9KViews1like0Comments