Azure Database for PostgreSQL Flexible Server
80 Topicspg_signal_autovacuum_worker Role for Managing Autovacuum in Azure Database for PostgreSQL
Have you ever faced a situation where your database is taking longer than expected to execute a DDL command? It can be a frustrating experience when a pending command execution blocks your workflow. If you’re wondering which process might be blocking your DDL command, there could be different reasons. One of the primary reasons could be when autovacuum process is executing for the table in the background. Let me explain this with the help of an example. Consider we have a table "accounts” in our Azure Database for PostgreSQL flexible server instance database. We want to add a field and execute an ALTER statement on it. When this statement is in the process of execution, you may observe the database has significantly slowed down or stopped responding to the read/write queries for a while, resulting in a latency spike. In this scenario, the first approach would be to make sure there are no concurrent processes working on the table before executing the ALTER statement. If one of these processes is autovacuum, you will have to wait until the autovacuum process is executed on the table, which will cause a delay in committing this statement. In managed services, giving a system-level access to users can cause severe security and database integrity concerns, so users cannot manage the critical background processes like autovacuum in a controlled manner. With the recent update, we have backported 'pg_signal_autovacuum_worker' role from PostgreSQL version 18, with this approach it is now possible to control the autovacuum process without giving system level access to the users by using 'pg_terminate_backend' command in Azure Database for PostgreSQL Flexible Server in PostgreSQL versions 15 and higher. This role helps non-super users interact with autovacuum process in a secure and controlled manner. Let’s learn more about the autovacuum process and how this new role 'pg_signal_autovacuum_worker' helps you to manage the process more securely with controlled access for non-super users. What is the autovacuum process? PostgreSQL uses various maintenance processes to handle heavy server loads, one of them being autovacuum. Autovacuum removes obsolete versions of tuples that were created by UPDATE or DELETE statements, and that aren’t accessible anymore by any outstanding transaction. The autovacuum_naptime and autovacuum_max_workers are important parameters for the autovacuum process. This process gets launched every autovacuum_naptime seconds, the autovacuum worker gets triggered based on the number of proportion of dead tuples in the table. You can read more about the autovacuum process and how it’s triggered here. Common Problems with autovacuum Autovacuum process is an important background process as it helps with regular cleanup of dead tuples. However if it conflicts with a long running transaction or in a write-heavy environment there can be following issues: Autovacuum Blocking DDL Operations: Normal Autovacuum: Autovacuum jobs cancel themselves after deadlock_timeout (default value is 1 second) when a user triggers DDL on the table. But in some cases, users can experience significant latency up to the value of deadlock_timeout while executing read/write queries on the table. Wraparound protection Autovacuum: This type of autovacuum does not cancel itself and blocks DDL until the job is completed. Challenges with Updating Parameters in Ongoing Autovacuum Processes: Autovacuum configuration can be tuned using server parameters like autovacuum_naptime, autovacuum_max_workers, autovacuum_analyze_scale_factor etc. When you have an autovacuum process that is running for a long time, the changed parameters do not get reflected in already running autovacuum process. Azure Database for PostgreSQL - Flexible Server customers have faced the challenge of not being able to terminate the process securely with more granular control over the autovacuum process. If the autovacuum process is triggered simultaneously with a business-critical DDL process and users do not have controlled access on this process, it can lead to frustrating experience. In our latest release, we have implemented a solution that is secure, backward-compatible, and easy to integrate. New role pg_signal_autovacuum_worker We addressed this challenge by backporting the ‘pg_signal_autovacuum_worker’ role (which will be introduced in PostgreSQL 18) to all Azure Database for PostgreSQL flexible server versions 15 and higher. The ‘pg_signal_autovacuum_worker’ role helps you manage autovacuum tasks more effectively and allows non-superusers to terminate the autovacuum processes. Users get controlled access for the autovacuum process with this new role. Advantages of backporting the pg_signal_autovacuum_worker role Backporting the 'pg_signal_autovacuum_worker' role has some important advantages: Compatibility: Backporting this role to previous versions ensures a seamless upgrade experience for existing PostgreSQL servers. Security: Users do not have access to this role by default. The role is granted to non-superusers without granting broader system-level access. It gives a more granular and secure way to manage autovacuum processes without compromising database integrity. Steps to add pg_signal_autovacuum_worker role 1. Let's create a new user using the below command: CREATE USER john WITH PASSWORD 'Pass123'; 2. Next, grant 'pg_signal_autovacuum_worker' role to the newly created user GRANT pg_signal_autovacuum_worker TO john; 3. Now, confirm if 'pg_signal_autovacuum_worker' role is assigned SELECT oid, rolname FROM pg_roles WHERE Pg_has_role('john', oid, 'member'); oid | rolname -------+----------------------------- 8916 | pg_signal_autovacuum_worker 24826 | john 4. Now 'john' without being a member of 'azure_superuser' can terminate the autovacuum process. SELECT Pg_terminate_backend(pid) FROM pg_stat_activity WHERE backend_type = 'autovacuum worker'; Conclusion Managing autovacuum processes has always been tricky, especially for non-superusers who couldn’t directly control them in a managed service environment. Now with the introduction of the 'pg_signal_autovacuum_worker' role available in Azure Database for PostgreSQL flexible server, this is possible. Database users have a secure and flexible way to manage autovacuum process without needing superuser access, making database maintenance easier and more efficient.Boost Your Postgres Server on Azure with Enhanced Azure Advisor Performance Recommendations!
Are you puzzled about why your server or database performance has suddenly degraded or become slow? This can be quite frustrating, especially when you are trying to maintain optimal performance for your applications. There could be numerous reasons behind this issue. It is essential to identify the root cause to implement the right solution and get your server back to its peak performance. Let us dive deeper into this and explore potential solutions together. In this blog post, I will take you on a journey through the comprehensive Azure Advisor performance recommendations specifically tailored for Azure Database for PostgreSQL flexible server. These insights will not only alert you to potential root causes but also guide you in resolving the server issues you are facing. We have you covered from hardware optimizations and logging enhancements to workload management, autovacuum tuning, bloat reduction, and managing inactive replication slots. Let's dive in and unlock the full potential of your server using the performance recommendations provided here. To ensure you receive proactive notifications regarding Azure Advisor recommendations at the server or subscription level, you have the option to configure alerts to be received via email or text message. Setting up Alerts for Recommendations Stay ahead of the game by setting alerts for Azure Advisor recommendations for your Azure Database for PostgreSQL flexible server! Whenever a recommendation is made for one of your resources, an event is triggered, ensuring you are always in the loop. You can customize these alerts to be recommendation-specific, targeting types of recommendations, or set them up at the Azure subscription level to receive notifications for all recommendations related to the resources within the specified subscription. This way, you will never miss a critical insight and can take proactive steps to maintain optimal server performance, ensuring you are always informed and ready to act! You can set up alerts based on the Impact level, Category, and Recommendation type. On Azure portal, search for Advisor and proceed. Go to Monitoring and Alerts and click on New Advisor Alert. Create the Advisor Alert and set up an Action Group for the recommendation event. When a recommendation alert is triggered, you will receive notification via email or text message according to your Action Group settings. This alert is sent once per recommendation to ensure that you are informed promptly for timely action. You have taken the proactive step of setting up an alert to notify you via email if your server enters any vulnerable state. This ensures that you are immediately informed of any potential issues. For instance, you recently received the following recommendation alert in your email, highlighting a specific area that requires attention. This way, you can promptly address any concerns and maintain the optimal performance of your server. Scenarios in which Azure Advisor recommendation will help keep your Azure Database for PostgreSQL flexible server performant. Let us explore the various scenarios in which Azure Advisor can proactively assist you in identifying and resolving performance issues with your PostgreSQL server. Scenario 1- Inactive Logical Replication slots You have received an alert indicating that your server has an inactive logical replication slot. The outlined section below highlights that this is a high-impact performance. This alert tells me a recommendation named " Review your server for inactive Logical Replication Slots" has been triggered. Th next step is to go to your server on Azure Portal. You will be able to see Recommendations on the Overview page of your server on the portal. Click on the highlighted recommendation above. This action will direct you to a detailed page about the recommendation alert you received. On this page, you will find comprehensive information about the recommendation, including the specific details and the reason for receiving it, which is provided in the "Reason" column. The "Reason" column for this recommendation states: "Found inactive logical replication slot(s). Here are the replication slots that have been inactive for the longest: ['myslot']." Click on the "Recommended Action" to proceed with the necessary steps to resolve your server's vulnerable state. You will be directed to a page that provides the resolution for the server issue encountered. The necessary action is outlined in the "Recommended Action" section. In this scenario, you will need to drop the inactive logical replication slot as mentioned below. You can then check your replication slot for it to be active or inactive. Resolution: Proceed to remove the inactive logical replication slot as shown below to maintain optimal server performance. After dropping the inactive replication slot, you will notice within 24 hours that the recommendation no longer appears on the Overview page under Recommendations in your Azure Portal. Scenario 2 – Checkpoints happening frequently You have now received an alert indicating that your server has a recommendation stating the server has frequent checkpoint warnings. The outlined section below highlights that this is a high-impact performance. This alert tells me a recommendation named "PostgreSqlCheckpointHappeningFrequently" has been triggered. Click on the highlighted recommendation above. This action will direct you to a detailed page about the recommendation alert you received. On this page, you will find comprehensive information in the Recommendation details section and the reason for receiving it, which is provided in the 'Reason' column. The 'Reason' column for this recommendation states: 'Checkpoints are occurring frequently over 3 consecutive hours'. Click on the 'Recommended Action' to proceed with the necessary steps to resolve your server's vulnerable state. The necessary action is outlined in the 'Recommended Action' section. In this scenario, you will need to update your max_wal_size server parameter. Clicking on the 'Recommended Action' will take you to the Server parameters page of your server as shown below. Search for the max_wal_size parameter and increase it accordingly. After updating the max_wal_size parameter and resolving the frequent checkpoint warnings, the recommendation will disappear from the Overview page under Recommendations in your Azure Portal within 24 hours. Scenario 3 – High Bloat ratio Let us discuss the alert you received regarding a high bloat ratio on your server(s). I will guide you through the troubleshooting process to understand what is happening on the server and how to resolve the issue. You have now received an alert indicating that your server has a recommendation stating that your server is having a high bloat ratio. Click on the highlighted recommendation above. This action will direct you to a detailed page about the recommendation alert you received. On this page, you will find comprehensive information in the Recommendation details section and the reason for receiving it, which is provided in the 'Reason' column. The 'Reason' column for this recommendation states: “Here are the Database(s) with the highest BloatRatio: ["testdb"] has a Bloat Ratio(dead/live + dead)[86.08]”. Click on the 'Recommended Action' to proceed with the necessary steps to resolve your server's vulnerable state. The necessary action is outlined in the 'Recommended Action' section. In this scenario, the page will redirect you to the Troubleshooting guides of your server on the Azure portal. Navigate to the Autovacuum Monitoring tab, where you will find information on Bloat Ratio, Live vs. Dead tuples, Vacuum and Analyze, Autovacuum workers in progress, Autovacuum per table, and Enhanced metrics by server and by database. Navigate through each section to learn more about the server’s bloat and autovacuum information as shown below. The Autovacuum monitoring troubleshooting guide is designed to assess the health of autovacuum or vacuum on Azure Database for PostgreSQL Flexible Server. During autovacuum execution, it removes dead tuples, freeing up disk space and improving query performance. Additionally, it updates the visibility map to track pages without outdated or deleted data, reducing unnecessary I/O operations. Another critical function of autovacuum is to prevent transaction ID wraparound situations, which could result in data loss. By analyzing the results of this troubleshooting guide, you can take proactive steps to optimize autovacuum and maintain the long-term health of your database. As a short-term solution, you can VACUUM ANALYZE the database or specific frequently used tables with high bloat to clean up dead tuples and bring your server to a performant state. Scenario 4 – Autovacuum turned OFF Let us discuss the alert you received regarding autovacuum server parameter on your server(s). I will guide you through the troubleshooting process to understand what is happening on the server and how to resolve the issue. You have now received an alert indicating that your server has a recommendation stating that your server has autovacuum set to OFF. Click on the highlighted recommendation above. This action will direct you to a detailed page about the recommendation alert you received. On this page, you will find comprehensive information in the Recommendation details section and the reason for receiving it, which is provided in the 'Reason' column. The 'Reason' column for this recommendation states: “atuovacumm is turned OFF, and the Bloat Ratio on these Database(s) ["postgres","testdb"] and Schema(s) ["pg_toast","public"] is greater than 50%”. Click on the 'Recommended Action' to proceed with the necessary steps to resolve your server's vulnerable state. The necessary action is outlined in the 'Recommended Action' section. In this scenario, you will need to turn on autovacuum server parameter. Clicking on the 'Recommended Action' will take you to the Server parameters page of your server as shown below. Search for the autovacuum parameter and turn it ON. Scenario 5 – Wraparound Let us walk through a scenario where you receive a Wraparound recommendation alert. We will cover how to troubleshoot and resolve the issue to maintain your server's performance. Click on the highlighted recommendation above. This action will direct you to a detailed page about the recommendation alert you received. On this page, you will find comprehensive information in the Recommendation details section and the reason for receiving it, which is provided in the 'Reason' column. The 'Reason' column for this recommendation states: “The server has utilized 99.85% of transactions. Now 2999999 transactions remaining.”. Click on the 'Recommended Action' to proceed with the necessary steps to resolve your server's vulnerable state. The necessary action is outlined in the 'Recommended Action' section. In this scenario, you will need to perform the mitigation steps provided in the troubleshooting guide. Clicking on the 'Recommended Action' will take you to the troubleshooting guides. Please visit the Autovacuum Blockers section to get into more details on your server behavior pertaining to wrap around. Scroll to the bottom of the page to find the Recommendations to resolve the issue as shown in the screenshot below. Follow through the Recommendations to proactively enable your server to perform effectively. Scenario 6 – logging parameters Too much logging always hinders the server’s performance; hence it is important to understand the need for logging and act accordingly. Let's walk through a scenario where you receive logging parameters recommendation alert (log_duration, log_min_duration_statement, log_error_verbosity, log_statement, log_statement_stats). We will cover how to troubleshoot and resolve the issue to maintain your server's performance. Click on each of the highlighted recommendations above. This action will direct you to a detailed page about the recommendation alert you received. On this page, you will find comprehensive information in the Recommendation details section and the reason for receiving it, which is provided in the 'Reason' column. The 'Reason' column for this recommendation will state the value of the parameter set. For example: “log_statement_stats is set to ON”. Click on the 'Recommended Action' to proceed with the necessary steps to resolve your server's vulnerable state. The necessary action is outlined in the 'Recommended Action' section. In these scenarios, you will be directed to the server parameters page on Azure portal to update the server parameter values as provided in the “Recommended Action”. Setting the server parameter values per the recommendation action enables your server resources to be utilized efficiently. Scenario 7 – Long running transaction Let us walk through a scenario where you receive a long running transaction recommendation alert. We will cover how to troubleshoot and resolve the issue to maintain your server's performance. Click on the highlighted recommendation above. This action will direct you to a detailed page about the recommendation alert you received. On this page, you will find comprehensive information in the Recommendation details section and the reason for receiving it, which is provided in the 'Reason' column. The 'Reason' column for this recommendation states: “Here are the longest running transactions. PID(s) [311568] Duration (hours): [29]”. Click on the 'Recommended Action' to proceed with the necessary steps to resolve your server's vulnerable state. The necessary action is outlined in the 'Recommended Action' section. In this scenario, you will need to perform the mitigation steps provided in the troubleshooting guide. Clicking on the 'Recommended Action' will take you to the troubleshooting guides. Please visit the High CPU -> Long running transactions section to find the recommended actions to resolve the issue. Scroll to the bottom of the page to find the Recommendations to resolve the issue as shown in the screenshot below. Follow the recommendation and terminate the session for the Pid’s identified to keep your server performant. Scenario 8 – High CPU Utilization Let us walk through a scenario where you receive a high CPU utilization recommendation alert. We will cover how to troubleshoot and resolve the issue to maintain your server's performance. Click on the highlighted recommendation above. This action will direct you to a detailed page about the recommendation alert you received. On this page, you will find comprehensive information in the Recommendation details section and the reason for receiving it, which is provided in the 'Reason' column. The 'Reason' column for this recommendation states: “90% or higher CPU utilization for more than 2 consecutive hours”. Click on the 'Recommended Action' to proceed with the necessary steps to resolve your server's vulnerable state. The necessary action is outlined in the 'Recommended Action' section. In this scenario, you will need to perform the mitigation steps provided in the troubleshooting guide. Clicking on the 'Recommended Action' will take you to the troubleshooting guides. Please visit the High CPU section to find the reasons for the CPU utilization to be high and take recommended actions to resolve the issue. Review each of the highlighted tabs in the screenshot to identify the issue. Each tab contains recommendations to resolve the specific causes of high CPU utilization. As there might be multiple reasons, make sure to go through all the tabs and follow the recommendations provided accordingly. Conclusion: Imagine having a proactive ally that helps you stay ahead of potential issues with your server; That's exactly what Azure Advisor offers! In many scenarios, Azure Advisor recommendations will empower you to clearly identify and depict your server's vulnerable state. Not only does it highlight these vulnerabilities, but it also provides you with actionable recommendations to ensure your server remains highly performant. With Azure Advisor by your side, you can confidently optimize your server's performance and security, making your operations smoother and more efficient.