best practices
1511 TopicsSuperfast using Web App and Managed Identity to invoke Function App triggers
TOC Introduction Setup References 1. Introduction Many enterprises prefer not to use App Keys to invoke Function App triggers, as they are concerned that these fixed strings might be exposed. This method allows you to invoke Function App triggers using Managed Identity for enhanced security. I will provide examples in both Bash and Node.js. 2. Setup 1. Create a Linux Python 3.11 Function App 1.1. Configure Authentication to block unauthenticated callers while allowing the Web App’s Managed Identity to authenticate. Identity Provider Microsoft Choose a tenant for your application and it's users Workforce Configuration App registration type Create Name [automatically generated] Client Secret expiration [fit-in your business purpose] Supported Account Type Any Microsoft Entra Directory - Multi-Tenant Client application requirement Allow requests from any application Identity requirement Allow requests from any identity Tenant requirement Use default restrictions based on issuer Token store [checked] 1.2. Create an anonymous trigger. Since your app is already protected by App Registration, additional Function App-level protection is unnecessary; otherwise, you will need a Function Key to trigger it. 1.3. Once the Function App is configured, try accessing the endpoint directly—you should receive a 401 Unauthorized error, confirming that triggers cannot be accessed without proper Managed Identity authorization. 1.4. After making these changes, wait 10 minutes for the settings to take effect. 2. Create a Linux Node.js 20 Web App and Obtain an Access Token and Invoke the Function App Trigger Using Web App (Bash Example) 2.1. Enable System Assigned Managed Identity in the Web App settings. 2.2. Open Kudu SSH Console for the Web App. 2.3. Run the following commands, making the necessary modifications: subscriptionsID → Replace with your Subscription ID. resourceGroupsID → Replace with your Resource Group ID. application_id_uri → Replace with the Application ID URI from your Function App’s App Registration. https://az-9640-faapp.azurewebsites.net/api/test_trigger → Replace with the corresponding Function App trigger URL. # Please setup the target resource to yours subscriptionsID="01d39075-XXXX-XXXX-XXXX-XXXXXXXXXXXX" resourceGroupsID="XXXX" # Variable Setting (No need to change) identityEndpoint="$IDENTITY_ENDPOINT" identityHeader="$IDENTITY_HEADER" application_id_uri="api://9c0012ad-XXXX-XXXX-XXXX-XXXXXXXXXXXX" # Install necessary tool apt install -y jq # Get Access Token tokenUri="${identityEndpoint}?resource=${application_id_uri}&api-version=2019-08-01" accessToken=$(curl -s -H "Metadata: true" -H "X-IDENTITY-HEADER: $identityHeader" "$tokenUri" | jq -r '.access_token') echo "Access Token: $accessToken" # Run Trigger response=$(curl -s -o response.json -w "%{http_code}" -X GET "https://az-9640-myfa.azurewebsites.net/api/my_test_trigger" -H "Authorization: Bearer $accessToken") echo "HTTP Status Code: $response" echo "Response Body:" cat response.json 2.4. If everything is set up correctly, you should see a successful invocation result. 3. Invoke the Function App Trigger Using Web App (nodejs Example) I have also provide my example, which you can modify accordingly and save it to /home/site/wwwroot/callFunctionApp.js and run it cd /home/site/wwwroot/ vi callFunctionApp.js npm init -y npm install azure/identity axios node callFunctionApp.js // callFunctionApp.js const { DefaultAzureCredential } = require("@azure/identity"); const axios = require("axios"); async function callFunctionApp() { try { const applicationIdUri = "api://9c0012ad-XXXX-XXXX-XXXX-XXXXXXXXXXXX"; // Change here const credential = new DefaultAzureCredential(); console.log("Requesting token..."); const tokenResponse = await credential.getToken(applicationIdUri); if (!tokenResponse || !tokenResponse.token) { throw new Error("Failed to acquire access token"); } const accessToken = tokenResponse.token; console.log("Token acquired:", accessToken); const apiUrl = "https://az-9640-myfa.azurewebsites.net/api/my_test_trigger"; // Change here console.log("Calling the API now..."); const response = await axios.get(apiUrl, { headers: { Authorization: `Bearer ${accessToken}`, }, }); console.log("HTTP Status Code:", response.status); console.log("Response Body:", response.data); } catch (error) { console.error("Failed to call the function", error.response ? error.response.data : error.message); } } callFunctionApp(); Below is my execution result: 3. References Tutorial: Managed Identity to Invoke Azure Functions | Microsoft Learn How to Invoke Azure Function App with Managed Identity | by Krizzia 🤖 | Medium Configure Microsoft Entra authentication - Azure App Service | Microsoft Learn174Views0likes0CommentsGet certified as an Azure AI Engineer (AI-102) this summer?
For developers, the accreditation as an Azure AI Engineer—certified through the rigorous AI-102 exam—has become a golden ticket to career acceleration. It isn’t just about coding chatbots or fine-tuning machine learning models; it’s about gaining the confidence (for you and for your business) that you can wield Azure’s toolkits to configure AI solutions that augment human capability. Before we dive in, if you’re planning to become certified as an Azure AI Engineer, you may find this Starter Learning Plan (AI 102) valuable—recently curated by a group of Microsoft experts, purposed for your success. We recommend adding it to your existing learning portfolio. It’s a light introduction that should take less than four hours, but it offers a solid glimpse into what to expect on your journey and the breadth of solutions you might craft in the future. From revolutionizing customer service with intelligent agents to optimizing supply chains through predictive analytics, Azure AI engineers sit at the confluence of technological ingenuity and business transformation. For those with an appetite for problem-solving and a vision for AI-driven futures, this certification isn’t just another badge—it’s an assertion of expertise in a field where demand is outpacing supply. Securing that expertise, however, requires more than just a weekend of cramming. Today’s aspiring AI engineers navigate an ecosystem of learning that is as modern as the field itself. Gone are the days when one could rely solely on a stack of manuals; now, candidates immerse themselves in a medley of Microsoft Learn modules, hands-on labs, AI-powered coding assistants, and community-led study groups. Many take a pragmatic approach—building real-world projects using Azure Cognitive Services and Machine Learning Studio to cement their understanding. Others lean on practice exams and structured courses from platforms like Pluralsight and Udemy, ensuring they aren’t just memorizing but internalizing the core principles. The AI-102 exam doesn’t reward rote knowledge—it demands fluency in designing, deploying, and securing AI solutions, making thorough preparation an indispensable part of the journey. In addition to the above learning plan, we want to provide a few other tips. Understand the Exam Objectives: Begin by thoroughly reviewing the AI-102 study guide. This document outlines the key topics and skills assessed, including planning and managing Azure AI solutions, implementing computer vision and natural language processing solutions, and deploying generative AI solutions. Familiarizing yourself with these areas will provide a structured framework for your study plan. Continuous memorization is part of your study. But if you get a bit bored from your flashcards and look for more ‘storyline’ style learning content, we recommend adding MSFT employee created learning plan to your mix. They are scenario-based and focus more on providing you with a structured understanding of how to do XYZ on Azure. Here are 3 examples: Modernize for AI Readiness Build AI apps with Azure Re-platform AI applications Hands-On Practice: Practical experience is invaluable. Engage with Azure AI services directly by building projects that incorporate computer vision, natural language processing, and other AI functionalities. This hands-on approach not only reinforces theoretical knowledge but also enhances problem-solving skills in real-world scenarios. Utilize Practice Assessments: Assess your readiness by taking advantage of free practice assessments provided by Microsoft. These assessments mirror the style and difficulty of actual exam questions, offering detailed feedback and links to additional resources for areas that may require further study. Stay Updated on Exam Changes: Certification exams are periodically updated to reflect the latest technologies and practices. Regularly consult the official exam page to stay informed about any changes in exam content or structure. Participate in Community Discussions: Engaging with peers through forums and study groups can provide diverse perspectives and insights. The Microsoft Q&A platform is a valuable resource for asking questions, sharing knowledge, and learning from the experiences of others preparing for the same certification. By systematically incorporating these strategies into your preparation, you'll be well-positioned to excel in the AI-102 exam and advance your career as an Azure AI Engineer. If you have additional tips or thoughts, let us know in the comments area. Good luck!Speed Up OpenAI Embedding By 4x With This Simple Trick!
In today’s fast-paced world of AI applications, optimizing performance should be one of your top priorities. This guide walks you through a simple yet powerful way to reduce OpenAI embedding response sizes by 75%—cutting them from 32 KB to just 8 KB per request. By switching from float32 to base64 encoding in your Retrieval-Augmented Generation (RAG) system, you can achieve a 4x efficiency boost, minimizing network overhead, saving costs and dramatically improving responsiveness. Let's consider the following scenario. Use Case: RAG Application Processing a 10-Page PDF A user interacts with a RAG-powered application that processes a 10-page PDF and uses OpenAI embedding models to make the document searchable from an LLM. The goal is to show how optimizing embedding response size impacts overall system performance. Step 1: Embedding Creation from the 10-Page PDF In a typical RAG system, the first step is to embed documents (in this case, a 10-page PDF) to store meaningful vectors that will later be retrieved for answering queries. The PDF is split into chunks. In our example, each chunk contains approximately 100 tokens (for the sake of simplicity), but the recommended chunk size varies based on the language and the embedding model. Assumptions for the PDF: - A 10-page PDF has approximately 3325 tokens (about 300 tokens per page). - You’ll split this document into 34 chunks (each containing 100 tokens). - Each chunk will then be sent to the embedding OpenAI API for processing. Step 2: The User Interacts with the RAG Application Once the embeddings for the PDF are created, the user interacts with the RAG application, querying it multiple times. Each query is processed by retrieving the most relevant pieces of the document using the previously created embeddings. For simplicity, let’s assume: - The user sends 10 queries, each containing 200 tokens. - Each query requires 2 embedding requests (since the query is split into 100-token chunks for embedding). - After embedding the query, the system performs retrieval and returns the most relevant documents (the RAG response). Embedding Response Size The OpenAI Embeddings models take an input of tokens (the text to embed) and return a list of numbers called a vector. This list of numbers represents the “embedding” of the input in the model so that it can be compared with another vector to measure similarity. In RAG, we use embedding models to quickly search for relevant data in a vector database. By default, embeddings are serialized as an array of floating-point values in a JSON document so each response from the embedding API is relatively large. The array values are 32-bit floating point numbers, or float32. Each float32 value occupies 4 bytes, and the embedding vector returned by models like OpenAI’s text-embedding-ada-002 typically consists of 1536-dimensional vectors. The challenge is the size of the embedding response: - Each response consists of 1536 float32 values (one per dimension). - 1536 float32 values result in 6144 bytes (1536 × 4 bytes). - When serialized as UTF-8 for transmission over the network, this results in approximately 32 KB per response due to additional serialization overhead (like delimiters). Optimizing Embedding Response Size One approach to optimize the embedding response size is to serialize the embedding as base64. This encoding reduces the overall size by compressing the data, while maintaining the integrity of the embedding information. This leads to a significant reduction in the size of the embedding response. With base64-encoded embeddings, the response size reduces from 32 KB to approximately 8 KB, as demonstrated below: base64 vs float32 Min (Bytes) Max (Bytes) Mean (Bytes) Min (+) Max (+) Mean (+) 100 tokens embeddings: text-embedding-3-small 32673.000 32751.000 32703.800 8192.000 (4.0x) (74.9%) 8192.000 (4.0x) (75.0%) 8192.000 (4.0x) (74.9%) 100 tokens embeddings: text-embedding-3-large 65757.000 65893.000 65810.200 16384.000 (4.0x) (75.1%) 16384.000 (4.0x) (75.1%) 16384.000 (4.0x) (75.1%) 100 tokens embeddings: text-embedding-ada-002 32882.000 32939.000 32909.000 8192.000 (4.0x) (75.1%) 8192.000 (4.0x) (75.2%) 8192.000 (4.0x) (75.1%) The source code of this benchmark can be found at: https://github.com/manekinekko/rich-bench-node (kudos to Anthony Shaw for creating the rich-bench python runner) Comparing the Two Scenarios Let’s break down and compare the total performance of the system in two scenarios: Scenario 1: Embeddings Serialized as float32 (32 KB per Response) Scenario 2: Embeddings Serialized as base64 (8 KB per Response) Scenario 1: Embeddings Serialized as Float32 In this scenario, the PDF embedding creation and user queries involve larger responses due to float32 serialization. Let’s compute the total response size for each phase: 1. Embedding Creation for the PDF: - 34 embedding requests (one per 100-token chunk). - 34 responses with 32 KB each. Total size for PDF embedding responses: 34 × 32 KB = 1088 KB = 1.088 MB 2. User Interactions with the RAG App: - Each user query consists of 200 tokens (which is split into 2 chunks of 100 tokens). - 10 user queries, requiring 2 embedding responses per query (for 2 chunks). - Each embedding response is 32 KB. Total size for user queries: Embedding responses: 20 × 32 KB = 640 KB. RAG responses: 10 × 32 KB = 320 KB. Total size for user interactions: 640 KB (embedding) + 320 KB (RAG) = 960 KB. 3. Total Size: Total size for embedding responses (PDF + user queries): 1088 KB + 640 KB = 1.728 MB Total size for RAG responses: 320 KB. Overall total size for all 10 responses: 1728 KB + 320 KB = 2048 KB = 2 MB Scenario 2: Embeddings Serialized as Base64 In this optimized scenario, the embedding response size is reduced to 8 KB by using base64 encoding. 1. Embedding Creation for the PDF: - 34 embedding requests. - 34 responses with 8 KB each. Total size for PDF embedding responses: 34 × 8 KB = 272 KB. 2. User Interactions with the RAG App: - Embedding responses for 10 queries, 2 responses per query. - Each embedding response is 8 KB. Total size for user queries: Embedding responses: 20 × 8 KB = 160 KB. RAG responses: 10 × 8 KB = 80 KB. Total size for user interactions: 160 KB (embedding) + 80 KB (RAG) = 240 KB 3. Total Size (Optimized Scenario): Total size for embedding responses (PDF + user queries): 272 KB + 160 KB = 432 KB. Total size for RAG responses: 80 KB. Overall total size for all 10 responses: 432 KB + 80 KB = 512 KB Performance Gain: Comparison Between Scenarios The optimized scenario (base64 encoding) is 4 times smaller than the original (float32 encoding): 2048 / 512 = 4 times smaller. The total size reduction between the two scenarios is: 2048 KB - 512 KB = 1536 KB = 1.536 MB. And the reduction in data size is: (1536 / 2048) × 100 = 75% reduction. How to Configure base64 encoding format When getting a vector representation of a given input that can be easily consumed by machine learning models and algorithms, as a developer, you usually call either the OpenAI API endpoint directly or use one of the official libraries for your programming language. Calling the OpenAI or Azure OpenAI APIs Using OpenAI endpoint: curl -X POST "https://api.openai.com/v1/embeddings" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{ "input": "The five boxing wizards jump quickly", "model": "text-embedding-ada-002", "encoding_format": "base64" }' Or, calling Azure OpenAI resources: curl -X POST "https://{endpoint}/openai/deployments/{deployment-id}/embeddings?api-version=2024-10-21" \ -H "Content-Type: application/json" \ -H "api-key: YOUR_API_KEY" \ -d '{ "input": ["The five boxing wizards jump quickly"], "encoding_format": "base64" }' Using OpenAI Libraries JavaScript/TypeScript const response = await client.embeddings.create({ input: "The five boxing wizards jump quickly", model: "text-embedding-3-small", encoding_format: "base64" }); A pull request has been sent to the openai SDK for Node.js repository to make base64 the default encoding when/if the user does not provide an encoding. Please feel free to give that PR a thumb up. Python embedding = client.embeddings.create( input="The five boxing wizards jump quickly", model="text-embedding-3-small", encoding_format="base64" ) NB: from 1.62 the openai SDK for Python will default to base64. Java EmbeddingCreateParams embeddingCreateParams = EmbeddingCreateParams .builder() .input("The five boxing wizards jump quickly") .encodingFormat(EncodingFormat.BASE64) .model("text-embedding-3-small") .build(); .NET The openai-dotnet library is already enforcing the base64 encoding, and does not allow setting encoding_format by the user (see). Conclusion By optimizing the embedding response serialization from float32 to base64, you achieved a 75% reduction in data size and improved performance by 4x. This reduction significantly enhances the efficiency of your RAG application, especially when processing large documents like PDFs and handling multiple user queries. For 1 million users sending 1,000 requests per month, the total size saved would be approximately 22.9 TB per month simply by using base64 encoded embeddings. As demonstrated, optimizing the size of the API responses is not only crucial for reducing network overhead but also for improving the overall responsiveness of your application. In a world where efficiency and scalability are key to delivering robust AI-powered solutions, this optimization can make a substantial difference in both performance and user experience. ■ Shoutout to my colleague Anthony Shaw for the the long and great discussions we had about embedding optimisations.Windows - A JavaScript error occurred in the main process
Hello Everyone, While installing Microsoft Teams, the error message described in the subject line is thrown. Am running Microsoft Windows Version - 1909 A couple of troubleshooting tips found on https://docs.microsoft.com/en-us/MicrosoftTeams/troubleshoot-installation have been tried but the error still shows up. Any idea on how to resolve this issue will be greatly appreciated. UPDATES I did a comparison of 2 SquirrelSetup.log files. (Both PC runs Windows 1909 and the same build number. The PC 1(Throwing up the JavaScript error above)logged some issues not found in the PC 2 as shown below: ----start---- ApplyReleasesImpl: Couldn't run Squirrel hook, continuing: C:\Users\Administrator\AppData\Local\Microsoft\Teams\current\Teams.exe: System.OperationCanceledException: The operation was canceled. at System.Threading.CancellationToken.ThrowOperationCanceledException() at Squirrel.Utility.<>c__DisplayClass11_0.<InvokeProcessAsync>b__0() at System.Threading.Tasks.Task.InnerInvoke() at System.Threading.Tasks.Task.Execute() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotificat ion(Task task) at Squirrel.Utility.<InvokeProcessAsync>d__11.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotificat ion(Task task) at Squirrel.UpdateManager.ApplyReleasesImpl.<>c__DisplayClass18_1.<<invokePostInsta ll>b__2>d.MoveNext() ---end---- FURTHER UPDATES Many suggestions found on GitHub, Microsoft support site, blogs and other websites have been tried. So after spending several hours googling and searching for a working solution, the following finally worked on the PC in question. A standard user account was created on the PC. Downloaded a new Microsoft teams app. In services.msc, the status of a service called Quality Windows Audio Video Experience was also manually started. Executed the app and no JavaScript error was thrown. N.B: I am still not satisfied with this fix, cos i applied the same troubleshooting approach on another PC throwing up the same JavaScript error, and the fix doesn't work on it. 🙂Solved201KViews1like11CommentsWhat's the biggest challenge your small business is facing with technology right now?
Hi everyone, We're curious to hear from you all about any technology challenges your business is currently facing. Whether it's managing remote work, cybersecurity concerns, or finding the right tools to streamline operations, let's share our experiences and solutions. Your insights could help others in the community who might be facing similar issues. Looking forward to hearing your thoughts!Azure App Service Auto-Heal: Capturing Relevant Data During Performance Issues
Introduction Azure App Service is a powerful platform that simplifies the deployment and management of web applications. However, maintaining application performance and availability is crucial. When performance issues arise, identifying the root cause can be challenging. This is where Auto-Heal in Azure App Service becomes a game-changer. Auto-Heal is a diagnostic and recovery feature that allows you to proactively detect and mitigate issues affecting your application’s performance. It enables automatic corrective actions and helps capture vital diagnostic data to troubleshoot problems efficiently. In this blog, we’ll explore how Auto-Heal works, its configuration, and how it assists in diagnosing performance bottlenecks. What is Auto-Heal in Azure App Service? Auto-Heal is a self-healing mechanism that allows you to define custom rules to detect and respond to problematic conditions in your application. When an issue meets the defined conditions, Auto-Heal can take actions such as: Recycling the application process Collecting diagnostic dumps Logging additional telemetry for analysis Triggering a custom action By leveraging Auto-Heal, you can minimize downtime, improve reliability, and reduce manual intervention for troubleshooting. Configuring Auto-Heal in Azure App Service To set up Auto-Heal, follow these steps: Access Auto-Heal Settings Navigate to the Azure Portal. Go to your App Service. Select Diagnose and Solve Problems. Search for Auto-Heal or go to Diagnostic tools tile and select Auto-Heal. Define Auto-Heal Rules Auto-Heal allows you to define rules based on: Request Duration: If a request takes too long, trigger an action. Memory Usage: If memory consumption exceeds a certain threshold. HTTP Status Codes: If multiple requests return specific status codes (e.g., 500 errors). Request Count: If excessive requests occur within a defined time frame. Configure Auto-Heal Actions Once conditions are set, you can configure one or more of the following actions: Recycle Process: Restart the worker process to restore the application. Log Events: Capture logs for further analysis. Custom Action: You can do the following: Run Diagnostics: Gather diagnostic data (Memory Dump, CLR Profiler, CLR Profiler with Threads Stacks, Java Memory Dump, Java Thread Dump) for troubleshooting. Run any Executable: Run scripts to automate corrective measures. Capturing Relevant Data During Performance Issues One of the most powerful aspects of Auto-Heal is its ability to capture valuable diagnostic data when an issue occurs. Here’s how: Collecting Memory Dumps Memory dumps provide insights into application crashes, high CP or high memory usage. These can be analyzed using WinDbg or DebugDiag. Enabling Logs for Deeper Insights Auto-Heal logs detailed events in Kudu Console, Application Insights, and Azure Monitor Logs. This helps identify patterns and root causes. Collecting CLR Profiler traces CLR Profiler traces capture call stacks and exceptions, providing a user-friendly report for diagnosing slow responses and HTTP issues at the application code level. In this article, we will cover the steps to configure an Auto-Heal rule for the following performance issues: To capture a .NET Profiler/CLR Profiler trace for Slow responses. To capture a .NET Profiler/CLR Profiler trace for HTTP 5XX Status codes. To capture Memory dump for a High Memory usage. Auto-Heal rule to capture .NET Profiler trace for Slow response: 1. Navigate to your App Service on Azure Portal, and click on Diagnose and Solve problems: 2. Search for Auto-Heal or go to Diagnostic tools tile and select Auto-Heal: 3. Click on 'On': 4. Select Request Duration and click on Add Slow Request rule: 5. Add the following information with respect to how much slowness you are facing: After how many slow requests you want this condition to kick in? - After how many slow requests you want this Auto-Heal rule to start writing/capturing relevant data. What should be minimum duration (in seconds) for these slow requests? - How many seconds should the request take to be considered as a slow request. What is the time interval (in seconds) in which the above condition should be met? - In how many seconds, the above defined slow request should occur. What is the request path (leave blank for all requests)? - If there is a specific URL which is slow, you can add that in this section or leave it as blank. In the below screenshot, the rule is set for this example "1 request taking 30 seconds in 5 minutes/300 seconds should trigger this rule" Add the values in the text boxes available and click "Ok" 6. Select Custom Action and select CLR Profiler with Thread Stacks option: 7. The tool options provide three choices: CollectKillAnalyze: If this option is selected, the tool will collect the data, analyze and generate the report and recycle the process. CollectLogs: If this option is selected, the tool will collect the data only. It will not analyze and generate the report and recycle the process. Troubleshoot: If this option is selected, the tool will collect the data and analyze and generate the report, but it will not recycle the process. Select the option, according to your scenario: Click on "Save". 8. Review the new settings of the rule: Clicking on "Save" will cause a restart as this is a configuration level change and for this to get in effect a restart is required. So, it is advised to make such changes in non-business hours. 9. Click on "Save". Once you click on Save, the app will get restarted and the rule will become active and monitor for Slow requests. Auto-Heal rule to capture .NET Profiler trace for HTTP 5XX Status code: For this scenario, Steps 1, 2, 3 will remain the same as above (from the Slow requests scenario). There will be following changes: 1. Select Status code and click on Add Status Code rule 2. Add the following value with respect to what Status code or range of status code you want this rule to be triggered by: Do you want to set this rule for a specific status code or a range of status codes? - Is it single status code you want to set this rule for or a range of status code. After how many requests you want this condition to kick in? - After how many requests throwing the concerned status code you want this Auto-Heal rule to start writing/capturing relevant data. What should be the status code for these requests? - Mention the status code here. What should be the sub-status code for these requests? - Mention the sub-status code here, if any, else you can leave this blank. What should be the win32-status code for these requests? - Mention the win32-status code here, if any, else you can leave this blank. What is the time interval (in seconds) in which the above condition should be met? - In how many seconds, the above defined status code should occur. What is the request path (leave blank for all requests)? - If there is a specific URL which is throwing that status code, you can add that in this section or leave it as blank. Add the values according to your scenario and click on "Ok" In the below screenshot, the rule is set for this example "1 request throwing HTTP 500 status code in 60 seconds should trigger this rule" After adding the above information, you can follow the Steps 6, 7 ,8, 9 from the first scenario (Slow Requests) and the Auto-Heal rule for the status code will become active and monitor for this performance issue. Auto-Heal rule to capture Memory dump for High Memory usage: For this scenario, Steps 1, 2, 3 will remain the same as above (from the Slow requests scenario). There will be following changes: 1. Select Memory Limit and click on Configure Private Bytes rule: 2. According to your application's memory usage, add the Private bytes in KB at which this rule should be triggered: In the below screenshot, the rule is set for this example "The application process using 2000000 KB (~2 GB) should trigger this rule" Click on "Ok" 3. In Configure Actions, select Custom Action and click on Memory Dump: 4. The tool options provide three choices: CollectKillAnalyze: If this option is selected, the tool will collect the data, analyze and generate the report and recycle the process. CollectLogs: If this option is selected, the tool will collect the data only. It will not analyze and generate the report and recycle the process. Troubleshoot: If this option is selected, the tool will collect the data and analyze and generate the report, but it will not recycle the process. Select the option, according to your scenario: 5. For the memory dumps/reports to get saved, you will have to select either an existing Storage Account or will have to create a new one: Click on Select: Create a new one or choose existing: 6. Once the storage account is set, click on "Save". Review the rule settings and click on "Save". Clicking on "Save" will cause a restart as this is a configuration level change and for this to get in effect a restart is required. So, it is advised to make such changes in non-business hours. Best Practices for Using Auto-Heal Start with Conservative Rules: Avoid overly aggressive auto-restarts to prevent unnecessary disruptions. Monitor Performance Trends: Use Azure Monitor to correlate Auto-Heal events with performance metrics. Regularly Review Logs: Periodically analyze collected logs and dumps to fine-tune your Auto-Heal strategy. Combine with Application Insights: Leverage Application Insights for end-to-end monitoring and deeper diagnostics. Conclusion Auto-Heal in Azure App Service is a powerful tool that not only helps maintain application stability but also provides critical diagnostic data when performance issues arise. By proactively setting up Auto-Heal rules and leveraging its diagnostic capabilities, you can minimize downtime and streamline troubleshooting efforts. Have you used Auto-Heal in your application? Share your experiences and insights in the comments! Stay tuned for more Azure tips and best practices!419Views2likes0Comments[On demand] Delivering like-local Windows experiences from the cloud
Learn how Windows cloud features like RDP Multipath and TURN improve connectivity and reduce connections times, while HEVC hardware acceleration and enhanced device redirection boost performance. Watch Delivering like-local Windows experiences from the cloud – now on demand – and join the conversation at https://aka.ms/LikeLocalInTheCloud. To help you learn more, here are the links referenced in the session: Hardware-accelerated HEVC (h.265) graphics encoding is currently in public preview! See Enable GPU acceleration for Azure Virtual Desktop | Microsoft Learn for more details For more free technical skilling on the latest in Windows, Windows in the cloud, and Microsoft Intune, view the full Microsoft Technical Takeoff session list.23Views0likes0CommentsOptimizing RDP Connectivity for Windows 365
Updated with RDP & Zscaler connectivity improvements February 2025 The use of VPN or Secure Web Gateway (SWG) client software or agents to provide tunneled access to on-premises resources in addition to providing protected internet access via a cloud based Secure Web Gateway (SWG) or a legacy VPN & on-premises proxy path is very commonly seen in Windows 365 and AVD deployments. This is especially the case when deployed in the recommended Windows 365 with Microsoft Hosted Network (MHN) model where the Cloud PC is located on a network with direct, open high-speed internet available. The more modern, cloud based SWG solutions fit perfectly with this modern Zero-Trust approach and generally perform at a higher level than traditional VPN software, where internet browsing is hairpinned through on-premises proxies and back out to the internet. As we have many Windows 365 customers using such solutions as part of their deployment, there are some specific configuration guidelines which are outlined in this post which Microsoft recommends are applied to optimize key traffic and provide the highest levels of user experience. What is the Problem? Many of these VPN/SWG solutions build a tunnel in the user context, which means that when a user logs in to their device, the service starts and creates the tunnels required to provide both internet and private access as defined for that user. With a physical device the tunnel is normally up and running before or shortly after the user sees their desktop on screen, meaning they can then quickly get on with their work without noticing its presence. However, as with any virtualized device which needs a remote connection to access, the above model poses several challenges: 1. Additional Latency Firstly, the remote desktop traffic is latency sensitive, in that delay to the traffic reaching its destination can easily translate into a poor user experience, with lag on actions and desktop display. Routing this traffic through a tunnel to an intermediary device to reach its destination adds latency and can restrict throughput regardless of how well configured or performing said device is. Modern SWG solutions tend to perform at a much higher levels than a traditional VPN/Proxy approach, but the highest level of experience is always achieved through a direct connection and avoiding any inspection or intermediary devices. Much like Teams media traffic, the RDP traffic in the Windows 365 case should be routed via the most optimal path between the two endpoints so as to deliver the very highest levels of performance, this is almost always the direct path via the nearest network egress. From a Cloud PC side this also means the traffic never leaves Microsoft’s managed network if directly egressed. 2. RDP Connection Drops An additional challenge comes from the use of user-based tunnels. As the user initiates a connection to the Cloud PC, the connection reaches the session host without issue and the user successfully sees the initial logon screen. However, once the user login starts, and the client software then builds the tunnels to the SWG/VPN for the user, the user then experiences a freeze of the login screen. The connection then drops, and we have to go through the reconnection process to re-establish the connection to the Cloud PC. Once this is complete, the user can successfully use the Cloud PC without further issue. Users however may also experience disconnects of the remote session if there is any issue with the tunnel, for example if the tunnel temporarily drops for some reason. Overall, this doesn’t provide a great user experience with the Cloud PC, especially on initial login. Why does this occur? It occurs because the tunnels built to route internet traffic to the SWG generally capture all internet bound traffic unless configured not to do so, a forced tunnel or ‘Inverse split tunnel’. This means the initial login works without issue but as soon as this tunnel is established upon user logon, the RDP traffic gets transferred into it and as it’s a new path, requires reconnecting. Equally, as the traffic is inside this tunnel, if the tunnel drops momentarily and needs to reconnect, this also causes the RDP session to require reconnecting inside the re-established tunnel. In the diagram below, you can see a simplified representation of this indirect connectivity approach with a forced tunnel in place. RDP traffic has to traverse the VPN/SWG resources before hitting the gateway handling the traffic. Whilst this is not a problem for less sensitive traffic and general web browsing, for latency critical traffic such as Teams and the RDP traffic, it is non-optimal. What’s the Solution? Microsoft strongly recommends implementing a forced tunnel exception for the critical RDP traffic which means that it does not enter the tunnel to the SWG or VPN gateway and is instead directly routed to its destination. This solves both of the above problems by providing a direct path for the RDP traffic and also ensuring it isn’t impacted by changes in the tunnel state. This is the same model as used by specific ‘Optimize’ marked Office 365 traffic such as Teams media traffic. On the Cloud PC side this also means this traffic never leaves Microsoft’s managed network. What exactly do I need to bypass from these tunnels? Previously, solving this problem meant significant complexity due to the large number of IP addresses required to configure optimization for this RDP traffic, we provided a script as part of this blog to assist with collecting and formatting these IPs. I'm pleased to share that Microsoft has invested in an extensive and complex piece of work to solve this challenge by building a new, upgraded global gateway infrastructure to allow it to be addressed from a single subnet. In addition to that simplification that we have planned so that this subnet should not see any regular change, abstracting customers from change as we scale the infrastructure and add new regions in future. As of February 2025, this work has now been completed and the old infrastructure decommissioned, this was all completed with zero downtime for our customers. This now allows the TCP based RDP based traffic to now be covered by two single subnets rather than many hundred as previously was the case. There are further improvement works due to be delivered in the coming months for UDP based RDP to provide new dedicated and globally scaled TURN infrastructure. This post will be updated when this is complete and RDP connectivity is therefore in its final and complete, simplified and secured state. These temporary elements are: The WindowsVirtualDesktop service tag still contains a large number of /32 addresses which were for the old infrastructure. The process to remove these from the tag is under way and will likely take a small number of weeks (eta mid-March 25). The endpoints outlined in the table below can be used for configuration without risk as the old infrastructure corresponding to the old IPs has already been decommissioned. 2. UDP based RDP via TURN is currently using the subnet 20.202.0.0/16 but will switch to 51.5.0.0/16 in H1 CY25. The new, dedicated subnet is in the WindowsVirtualDesktop service tag but the current one (20.202.0.0/16) is not, so will manually need to be added to current bypass configuration if desired. More on this can be found in this post. This work will also vastly expand our global TURN relay availability. Today this is only available when the physical device is in the vicinity of these Azure regions. RDP based Connectivity bypass: As of February 2025, the critical traffic which carries RDP is contained within the following simplified endpoints: RDP Endpoints for Optimization Row Endpoint Protocol Port Purpose 1 *.wvd.microsoft.com TCP 443 Core TCP based RDP and other critical service traffic 2 40.64.144.0/20 TCP 443 Core TCP based RDP 3 20.202.0.0/16 UDP 3478 Core UDP based RDP via TURN - Current 4 51.5.0.0/16 UDP 3478 Core UDP based RDP via TURN – Future (Currently not in use) Please see this article for more information on rows 3 & 4 In some network equipment/software we can configure bypass using FQDNs and wildcard FQDNs alone, and we’d recommend that this method (row 1) is used in addition to the IP based rules if possible. However, some solutions do not allow the use of wildcard FQDNs so it’s common to see only IP addresses used for this bypass configuration. In this case you can use the newly simplified rows 2 & 3 in the table above, making sure row 1 is still accessible via the SWG/Proxy. We also recommend row 4 is also added to manually configured optimizations to ensure this is also optimized when it comes into use in the coming months. There are also a small number of other endpoints which should be bypassed on the Cloud PC side. Other required VPN/SWG bypass requirements: Other endpoints for Optimization Row Endpoint Protocol Port Purpose 5 azkms.core.windows.net TCP 1688 Azure KMS - Traffic Needs to arrive from Azure public IPs 6 169.254.169.254 TCP 80 Azure Fabric communication 7 168.63.129.16 TCP 80 Azure Fabric communication These additional bypass requirements (5-7) are not RDP related but are required for the following reasons: Row 5 – This is Azure KMS activation which is a required endpoint for a Cloud PC and AVD Session Hosts. The traffic for this needs to arrive from an Azure public IP, if not then the connection will not be successful. Therefore it should not be sent via a 3 rd party internet egress such as via an SWG or proxy. IP addresses corresponding to the FQDN can be found via the link above if required. Rows 6 & 7 – These are critical IP addresses used to communicate to the Azure Fabric to operate the VM. We need to ensure these are not inadvertently sent in any VPN/SWG tunnel where they will not be then able to reach their destination in Azure. How do I implement the RDP bypass in common VPN/SWG solutions? Microsoft is working with several partners in this space to provide bespoke guidance and we’ll add detailed guidance for other solutions here as we get them confirmed. Already available however is Zscaler ZIA. Zscaler Client Connector The changes outlined above should make configuration in all scenarios vastly simpler moving forward. Due to some fantastic work to assist our mutual customers by our friends at Zscaler, as of February 2025 and version 4.3.2 of the Zscaler Client Connector, the majority of the mentioned Windows 365 and AVD traffic which requires optimization, including RDP can be bypassed with a single click configuration within a predefined IP based bypass! Zscaler ZIA Configuration Version 4.3.2 (Released Feb 2025) of the Zscaler Connector Client portal enables this feature. Ensure a recent version of the Client Connector is installed on both the Cloud PC (And Physical device if Zscaler is used there) to take advantage. In the Zscaler Client Connector Portal, select the new IP-Based, Predefined Application Bypass for Windows 365 & Azure Virtual Desktop. This contains preconfigured bypass for RDP and KMS traffic. 3. Add the following endpoints to the bypass configuration manually as they are not included in the automatic bypass. As noted above, 20.202.0.0/16 will become unnecessary in a few months and will be removed from this document when decommissioned. It’s replacement (51.5.0.0/16) is already included in the preconfigured bypass. Endpoint Protocol Port Purpose 20.202.0.0/16 UDP 3478 Core UDP based RDP via TURN - Current 169.254.169.254 TCP 80 Azure Fabric communication 168.63.129.16 TCP 80 Azure Fabric communication Other VPN/SWG solutions Microsoft is currently working with other partners in this space to provide detailed guidance for other VPN/SWG solutions and will list them here as they are complete. Please let us know in the comments if you’d like us to list a particular solution and we’ll aim to prioritize based on feedback. In the interim, use rows 1-7 in the tables above to create manual bypasses from VPN/SWG/Proxy tunnels. This should be significantly simpler and have much lower change rates than previously due to the IP consolidation. FAQs: Q: In a Microsoft Hosted Network deployment, is there anything else I need to do? A: Unless the local Windows firewall is configured to block access to the endpoints noted, there should be nothing else required, the network the virtual NIC sits in has direct, high speed connectivity Microsoft’s backbone and the internet. Q: In an Azure Network Connection scenario, is there anything further I need to do? A: In this scenario, the recommended path for the traffic is directly out of the VNet into Microsoft’s backbone. Depending on the configuration it may require allowing the endpoints noted in this article through a firewall or NSG. The WindowsVirtualDesktop service tag or FQDN tag may help with automating rules in firewalls or configuring User Defined Routing. RDP traffic specifically should be sent direct into Microsoft’s backbone via a NAT Gateway or similar with no TLS inspection, avoiding putting load on NVAs such as Firewalls. Q: Do I need to configure the bypass on just the Cloud PC? A: RDP connectivity (Rows 1-4) is used identically on both the physical and cloud sides. It is strongly advised that the bypass is applied to both the Cloud PC and the connecting client if that also uses the SWG/VPN to connect. If both are using the same configuration profile then this should happen automatically. Rows 5-7 are only required on the cloud side. Q: How often do the IP addresses Change? A: Now the improvement work is complete we don’t anticipate regular change. You can monitor the WindowsVirtualDesktop service tag for changes if desired and we’re working on getting these requirements into the M365 Web Service longer term for monitoring and automation. Q: Can I add more than the RDP traffic to the bypass. A: Microsoft only provides IP addresses for the RDP connectivity at present. However if your solution is capable of configuration by FQDN alone, then you can add other service endpoints to your optimized path, these can be found on this Microsoft docs page. Q: Im using a true split tunnel, does this impact me? A: The above advice is for a forced tunnel scenario (inverse split tunnel) where the default path is via the tunnel and only defined exceptions are sent direct, which is often referred to as a split tunnel in common parlance and is the most commonly seen deployment model of such solutions. However a split tunnel in the technically accurate sense of the words, where the default path is the internet and only defined endpoints (such as corp server ranges/names) are sent down the tunnel, shouldn’t need such configuration as the RDP traffic should follow the default path to the internet. Q: Does this also optimize RDP shortpath? A: RDP Shortpath for Public Networks works to provide a UDP based RDP connection between the client and Cloud PC if enabled and achievable. This connection is in addition to the TCP based connection described above and the dynamic virtual channels such as graphics, input etc are switched into the UDP connection if deemed optimal. Rows 3 & 4 above cover this traffic for connectivity via TURN relays. Please see this article for more information on this connectivity model. Q: Is this advice also shared in Microsoft’s official documentation? A: We’re currently working on uplifting the entire connectivity documentation for Windows 365 and the above will form part of this work in the coming months. We’ll share the official link in this blog when available. Q: Does this advice apply equally to AVD? A: Yes, both Windows 365 and AVD have exactly the same requirements in terms of the connectivity discussed in this blog.58KViews10likes17CommentsCapture .NET Profiler Trace on the Azure App Service platform
Summary The article provides guidance on using the .NET Profiler Trace feature in Microsoft Azure App Service to diagnose performance issues in ASP.NET applications. It explains how to configure and collect the trace by accessing the Azure Portal, navigating to the Azure App Service, and selecting the "Collect .NET Profiler Trace" feature. Users can choose between "Collect and Analyze Data" or "Collect Data only" and must select the instance to perform the trace on. The trace stops after 60 seconds but can be extended up to 15 minutes. After analysis, users can view the report online or download the trace file for local analysis, which includes information like slow requests and CPU stacks. The article also details how to analyze the trace using Perf View, a tool available on GitHub, to identify performance issues. Additionally, it provides a table outlining scenarios for using .NET Profiler Trace or memory dumps based on various factors like issue type and symptom code. This tool is particularly useful for diagnosing slow or hung ASP.NET applications and is available only in Standard or higher SKUs with the Always On setting enabled. In this article How to configure and collect the .NET Profiler Trace How to download the .NET Profiler Trace How to analyze a .NET Profiler Trace When to use .NET Profilers tracing vs. a memory dump The tool is exceptionally suited for scenarios where an ASP.NET application is performing slower than expected or gets hung. As shown in Figure 1, this feature is available only in Standard or higher Stock Keeping Unit (SKU) and Always On is enabled. If you try to configure .NET Profiler Trace, without both configurations the following messages is rendered. Azure App Service Diagnose and solve problems blade in the Azure Portal error messages Error – This tool is supported only on Standard, Premium, and Isolated Stock Keeping Unit (SKU) only with AlwaysOn setting enabled to TRUE. Error – We determined that the web app is not "Always-On" enabled and diagnostic does not work reliably with Auto Heal. Turn on the Always-On setting by going to the Application Settings for the web app and then run these tools. How to configure and collect the .NET Profiler Trace To configure a .NET Profiler Trace access the Azure Portal and navigate to the Azure App Service which is experiencing a performance issue. Select Diagnose and solve problems and then the Diagnostic Tools tile. Azure App Service Diagnose and solve problems blade in the Azure Portal Select the "Collect .NET Profiler Trace" feature on the Diagnostic Tools blade and the following blade is rendered. Notice that you can only select Collect and Analyze Data or Collect Data only. Choose the one you prefer but do consider having the feature perform the analysis. You can download the trace for offline analysis if necessary. Also notice that you need to **select the instance** on which you want to perform the trace. In the scenario, there is only one, so the selection is simple. However, if your app runs on multiple instances, either select them all or if you identify a specific instance which is behaving slowly, select only that one. You realize the best results if you can isolate a single instance enough so that the request you sent is the only one received on that instance. However, in a scenario where the request or instance is not known, the trace adds value and insights. Adding a thread report provides list of all the threads in the process is also collected at the end of the profiler trace. The thread report is useful especially if you are troubleshooting hung processes, deadlocks, or requests taking more than 60 seconds. This pauses your process for a few seconds until the thread dump is generated. CAUTION: a thread report is NOT recommended if you are experiencing High CPU in your application, you may experience issues during trace analysis if CPU consumption is high. Azure App Service Diagnose and solve problems, Collect .NET Profiler Trace blade in the Azure Portal There are a few points called out in the previous image which are important to read and consider. Specifically the .NET Profiler Trace will stop after 60 seconds from the time that it is started. Therefore, if you can reproduce the issue, have the reproduction steps ready before you start the profiling. If you are not able to reproduce the issue, then you may need to run the trace a few times until the slowness or hang occurs. The collection time can be increased up to 15 minutes (900 seconds) by adding an application setting named IIS_PROFILING_TIMEOUT_IN_SECONDS with a value of up to 900. After selecting the instance to perform the trace on, press the Collect Profiler Trace button, wait for the profiler to start as seen here, then reproduce the issue or wait for it to occur. Azure App Service Diagnose and solve problems, Collect .NET Profiler Trace status starting window After the issue is reproduced the .NET Profiler Trace continues to the next step of stopping as seen here. Azure App Service Diagnose and solve problems, Collect .NET Profiler Trace status stopping window Once stopped, the process continues to the analysis phase if you selected the Collect and Analyze Data option, as seen in the following image, otherwise you are provided a link to download the file for analysis on your local machine. The analysis can take some time, so be patient. Azure App Service Diagnose and solve problems, Collect .NET Profiler Trace status analyzing window After the analysis is complete, you can either view the Analysis online or download the trace file for local development. How to download the .NET Profiler Trace Once the analysis is complete you can view the report by selecting the link in the Reports column, as seen here. Azure App Service Diagnose and solve problems, Collect .NET Profiler Trace status complete window Clicking on the report you see the following. There is some useful information in this report, like a list of slow requests, Failed Requests, Thread Call stacks, and CPU stacks. Also shown is a breakdown of where the time was spent during the response generation into categories like Application Code, Platform, and Network. In this case, all the time is spent in the Application code. Azure App Service Diagnose and solve problems, Collect .NET Profiler Trace review the Report To find out specifically where in the Application Code this request performs the analysis of the trace locally. How to analyze a .NET Profiler Trace After downloading the network trace by selecting the link in the Data column, you can use a tool named Perf View which is downloadable on GitHub here. Begin by opening Perf View and double-clicking on the ".DIAGSESSION" file, after some moments expand it to render the Event Trace Log (ETL) file, as shown here. Analyze Azure App Service .NET Profiler Trace with Perf View Double-click on the Thread Time (with startStop Activities) Stacks which open up a new window similar to shown next. If your App Service is configured as out-of-process select the dotnet process which is associated to your app code. If your App Service is in-process select the w3wp process. Analyze Azure App Service .NET Profiler Trace with Perf View, dotnet out-of-process Double-click on dotnet and another window is rendered, as shown here. From the previous image, .NET Profiler Trace reviews the Report, it is clear where slowness is coming from, find that in the Name column or search for it by entering the page name into the Find text box. Analyze Azure App Service .NET Profiler Trace with Perf View, dotnet out-of-process, method, and class discovery Once found right-click on the row and select Drill Into from the pop-up menu, shown here. Select the Call Tree tab and the reason for the issue renders showing which request was performing slow. Analyze Azure App Service .NET Profiler Trace with Perf View, dotnet out-of-process, root cause This example is relatively. As you analyze more performance issues using Perf View to analyze a .NET Profiler Trace your ability to find the root cause of more complicated performance issues can be realized. When to use .NET Profilers tracing vs. a memory dump That same issue is seen in a memory dump, however there are some scenarios where a .NET Profile trace would be best. Here is a table, Table 1, which describes scenarios for when to capture a .NET profile trace or to capture a memory dump. Issue Type Symptom Code Symptom Stack Startup Issue Intermittent Scenario Performance 200 Requests take 500 ms to 2.5 seconds, or takes <= 60 seconds ASP.NET/ASP.NET Core No No Profiler Performance 200 Requests take > 60 seconds & < 230 seconds ASP.NET/ASP.NET Core No No Dump Performance 502.3/500.121/503 Requests take >=120 to <= 230 seconds ASP.NET No No Dump, Profiler Performance 502.3/500.121/503 Requests timing out >=230 ASP.NET/ASP.NET Core Yes/No Yes/No Dump Performance 502.3/500.121/503 App hangs or deadlocks (ex: due to async anti-pattern) ASP.NET/ASP.NET Core Yes/No Yes/No Dump Performance 502.3/500.121/503 App hangs on startup (ex: caused by nonasync deadlock issue) ASP.NET/ASP.NET Core No Yes/No Dump Performance 502.3/500.121 Request timing out >=230 (time out) ASP.NET/ASP.NET Core No No Dump Availability 502.3/500.121/503 High CPU causing app downtime ASP.NET No No Profiler, Dump Availability 502.3/500.121/503 High Memory causing app downtime ASP.NET/ASP.NET Core No No Dump Availability 500.0[121]/503 SQLException or Some Exception causes app downtime ASP.NET No No Dump, Profiler Availability 500.0[121]/503 App crashing due to fatal exception at native layer ASP.NET/ASP.NET Core Yes/No Yes/No Dump Availability 500.0[121]/503 App crashing due to exit code (ex: 0xC0000374) ASP.NET/ASP.NET Core Yes/No Yes/No Dump Availability 500.0 App begin nonfatal exceptions (during a context of a request) ASP.NET No No Profiler, Dump Availability 500.0 App begin nonfatal exceptions (during a context of a request) ASP.NET/ASP.NET Core No Yes/No Dump Table 1, when to capture a .NET Profiler Trace or a Memory Dump on Azure App Service, Diagnose and solve problems Use this list as a guide to help decide how to approach the solving of performance and availability applications problems which are occurring in your application source code. Here are some descriptions regarding the column heading. - Issues Type – Performance means that a request to the app is responding or processing the response but not at a speed in which it is expected to. Availability means that the request is failing or consuming more resources than expected. - Symptom Code – the HTTP Status and/or sub status which is returned by the request. - Symptom – a description of the behavior experienced while engaging with the application. - Stack – this table targets .NET, specifically ASP.NET, and ASP.NET Core applications. - Startup Issue – if "No" then the Scenario can or should be used, "No" represents that the issue is not at startup. If "Yes/No" it means the Scenario is useful for troubleshooting startup issues. - Intermittent – if "No" then the Scenario can or should be used, "No" means the issue is not intermittent or that it can be reproduced. If "Yes/No" it means the Scenario is useful if the issue happens randomly or cannot be reproduced. Meaning that the tool can be set to trigger on a specific event or left running for a specific amount of time until the exception happens. - Scenario – "Profiler" means that the collection of a .NET Profiler Trace would be recommended. "Dump" means that a memory dump would be your best option. If both are provided, then both can be useful when the given symptoms and system codes are present. You might find the videos in Table 2 useful which instruct you how to collect and analyze a memory dump or .NET Profiler Trace. Product Stack Hosting Symptom Capture Analyze Scenario App Service Windows in High CPU link link Dump App Service Windows in High Memory link link Dump App Service Windows in Terminate link link Dump App Service Windows in Hang link link Dump App Service Windows out High CPU link link Dump App Service Windows out High Memory link link Dump App Service Windows out Terminate link link Dump App Service Windows out Hang link link Dump App Service Windows in High CPU link link Dump Function App Windows in High Memory link link Dump Function App Windows in Terminate link link Dump Function App Windows in Hang link link Dump Function App Windows out High CPU link link Dump Function App Windows out High Memory link link Dump Function App Windows out Terminate link link Dump Function App Windows out Hang link link Dump Azure WebJob Windows in High CPU link link Dump App Service Windows in High CPU link link .NET Profiler App Service Windows in Hang link link .NET Profiler App Service Windows in Exception link link .NET Profiler App Service Windows out High CPU link link .NET Profiler App Service Windows out Hang link link .NET Profiler App Service Windows out Exception link link .NET Profiler Table 2, short video instructions on capturing and analyzing dumps and profiler traces Here are a few other helpful videos for troubleshooting Azure App Service Availability and Performance issues: View Application EventLogs Azure App Service Add Application Insights To Azure App Service Prior to capturing and analyzing memory dumps, consider viewing this short video: Setting up WinDbg to analyze Managed code memory dumps and this blog post titled: Capture memory dumps on the Azure App Service platform. Question & Answers - Q: What are the prerequisites for using the .NET Profiler Trace feature in Azure App Service? A: To use the .NET Profiler Trace feature in Azure App Service, the application must be running on a Standard or higher Stock Keeping Unit (SKU) with the Always On setting enabled. If these conditions are not met, the tool will not function, and error messages will be displayed indicating the need for these configurations. - Q: How can you extend the default collection time for a .NET Profiler Trace beyond 60 seconds? A: The default collection time for a .NET Profiler Trace is 60 seconds, but it can be extended up to 15 minutes (900 seconds) by adding an application setting named IIS_PROFILING_TIMEOUT_IN_SECONDS with a value of up to 900. This allows for a longer duration to capture the necessary data for analysis. - Q: When should you use a .NET Profiler Trace instead of a memory dump for diagnosing performance issues in an ASP.NET application? A: A .NET Profiler Trace is recommended for diagnosing performance issues where requests take between 500 milliseconds to 2.5 seconds or less than 60 seconds. It is also useful for identifying high CPU usage causing app downtime. In contrast, a memory dump is more suitable for scenarios where requests take longer than 60 seconds, the application hangs or deadlocks, or there are issues related to high memory usage or app crashes due to fatal exceptions. Keywords Microsoft Azure, Azure App Service, .NET Profiler Trace, ASP.NET performance, Azure debugging tools, .NET performance issues, Azure diagnostic tools, Collect .NET Profiler Trace, Analyze .NET Profiler Trace, Azure portal, Performance troubleshooting, ASP.NET application, Slow ASP.NET app, Azure Standard SKU, Always On setting, Memory dump vs profiler trace, Perf View analysis, Azure performance diagnostics, .NET application profiling, Diagnose ASP.NET slowness, Azure app performance, High CPU usage ASP.NET, Azure app diagnostics, .NET Profiler configuration, Azure app service performance259Views1like0Comments