Blog Post

Linux and Open Source Blog
4 MIN READ

How Microsoft Ensures the Quality of Linux VM Images and Platform Experiences on Azure?

KashanK's avatar
KashanK
Icon for Microsoft rankMicrosoft
Dec 09, 2024

 

In the continuously evolving landscape of cloud computing and AI, the quality and reliability of virtual machines (VMs) plays vital role for businesses running mission-critical workloads. With over 65% of Azure workloads running Linux our commitment to delivering high-quality Linux VM images and platforms remains unwavering. This involves overcoming unique challenges and implementing rigorous validation processes to ensure that every Linux VM image offered on Azure meets the high standards of quality and reliability.

Ensuring the quality of Linux images and the overall platform experience on Azure involves addressing the challenges posed by a unique platform stack and the complexity of managing and validating multiple independent release cycles. High-quality Linux VMs are essential for ensuring consistent performance, minimizing downtime and regressions, and enhancing security by addressing vulnerabilities with timely updates.

 

                                             Figure 1: Complexity of Linux VMs in Azure 

  • VM Image Updates: Azure's Marketplace offers a diverse array of  Linux distributions, each maintained by its respective publishers. These distributions release updates on their own schedules, independent of Azure's infrastructure updates.
  • Package Updates: Within each Linux distribution, numerous packages are maintained and updated separately, adding another layer of complexity to the update and validation process.
  • Extension and Agent Updates: Azure provides over 75+ guest VM extensions to enhance operating system capabilities, security, recovery etc. These extensions are updated independently, requiring careful validation to ensure compatibility and stability.
  • Azure Infrastructure Updates: Azure regularly updates its underlying infrastructure, including components like Azure Boost, to improve reliability, performance, and security.
  • VM SKUs and Sizes: Azure provides thousands of VM sizes with various combinations of CPU, memory, disk, and network configurations to meet diverse customer needs.

Managing concurrent updates across all VMs poses significant QA challenges. To address this, Azure uses rigorous testing, gating and validation processes to ensure all components function reliably and meet customer expectations.

Azure’s Approach to Overcoming Challenges

To address these challenges, we have implemented a comprehensive validation strategy that involves testing at every stage of the image and kernel lifecycle. By adopting a shift-left approach, we execute Linux VM-specific test cases as early as possible. This strategy helps us catch failures close to the source of changes before they are deployed to Azure fleet. Our validation gates integrate with various entry points and provide coverage for a wide variety of scenarios on Azure.

  • Upstream Kernel Validation: As a founding member of Kernel CI, Microsoft validates commits from Linux next and stable trees using Linux VMs in Azure and shares results with the community via Kernel CI DB. This enables us to detect regressions at early stages. 
  • Azure-Tuned Kernel Validation: Azure-Tuned Kernels provided by our endorsed distribution partners are thoroughly validated and signed off by Microsoft before it is released to the Azure fleet. 
  • Linux Guest Image Validation: The quality team works with endorsed distribution partners for major releases to conduct thorough validation. Each refreshed image, including those from third-party publishers, is validated and certified before being added to the marketplace. Automated pipelines are in place to validate the images once they are available in the Marketplace. 
  • Package Validation: Unattended Update: We conduct validation of packages updates with target distro to prevent regression and ensure that only tested snapshots are utilized for updating Linux VM in Azure.  
  • Guest Extension Validation: Every Azure-provided extensions undergoes Basic Validation Testing (BVT) across all images and kernel versions to ensure compatibility and functionality amidst any changes. Additionally, comprehensive release testing is conducted for major releases to maintain reliability and compatibility.
  • New VM SKU Validation: Any new VM SKU undergoes validation to confirm it supports Linux before its release to the Azure fleet. This process includes functionality, performance and stress testing across various Linux distributions, and compatibility tests with existing Linux images in the fleet.
  • Azure HostOS & Host Agent Validation: Updates to the Azure Host OS & Agents are thoroughly tested from the Linux guest OS perspective to confirm that changes in the Azure host environment do not result in regressions in compatibility, performance, or stability for Linux VMs. 

 

 

At any stage where regressions or bugs are identified, we block those releases to ensure they never reach customers. All issues are resolved and rigorously retested before images, kernels, or extension updates are made available. Through these robust validation processes, Azure ensures that Linux VMs consistently deliver to customer expectations, delivering a reliable, secure, and high-performance environment for mission-critical workloads.

Validation Tools for VM Guest Images and Kernel

To ensure the quality and reliability of Linux VM images and kernels on Azure, we leverage open-source kernel testing frameworks like LTP, kselftest, and fstest, along with extensive Azure-specific test cases available in  LISA, to comprehensively validate all aspects of the platforms.

LISA (Linux Integration Services Automation): Microsoft is committed to open source and that is no different with our testing framework LISA. LISA is an open-source core testing framework designed to meet all Linux validation needs. It includes over 400 tests covering performance, features and security, ensuring comprehensive validation of Linux images on Azure. By automating diverse test scenarios, LISA enables early detection and resolution of issues, enhancing the stability and performance of Linux VMs.

Conclusion

At Azure, Linux quality is a fundamental aspect of our commitment to delivering reliable VM images and platforms. Through comprehensive testing and strong collaboration with Linux distribution partners, we ensure quality and reliability of VMs while proactively identifying and resolving potential issues. This approach allows us to continually refine our processes and maintain the quality that customers expect from Azure.

Quality is a core focus, and we remain dedicated to continuous improvement, delivering world-class Linux environments to businesses and customers. For us, quality is not just a priority—it’s our standard.

Your feedback is invaluable, and we would greatly appreciate your insights.

Updated Dec 09, 2024
Version 1.0
No CommentsBe the first to comment