troubleshooting
125 TopicsImproved AlwaysOn Availability Group Lease Timeout Diagnostics
First published on MSDN on Feb 23, 2016 When your AlwaysOn availability group is configured for automatic failover, you may find your availability group failed over, or if configured for manual failover, you may observe your availability group transition from the PRIMARY role to the RESOLVING role, during which users cannot access the availability group databases.21KViews0likes0CommentsEffectively troubleshoot latency in SQL Server Transactional replication: Part 1
Are you struggling with latency issues in SQL Server Transactional replication? This article provides clear, step-by-step instructions to effectively troubleshoot and resolve these challenges. Dive into proven techniques and best practices that will help you enhance your SQL Server's performance and ensure seamless data replication. Don't let latency slow you down—master the art of SQL Server troubleshooting today! I hope you find this teaser engaging! If you need any adjustments or additional content, feel free to let me know. Thanks to Collin Benkler, Senior Escalation Engineer in Microsoft for SQL Server for his valuable feedbacks.5.2KViews4likes4CommentsHow to troubleshoot “SQL Server is terminating because of fatal exception c06d007e”
Case 1, with Dump generated Issue definition: ======== Customer have SQL 2008/2012/2014 running on the same server, and then applied the SP4 for SQL Server 2012 in Windows update. But after this customer saw some issue from application side, then you rollback the update. And then Customer applied the SP3 for SQL Server 2012, but after that the SQL Server service can’t startup with following message and dump generated. Customer also tried to repair SQL Server many times, it doesn’t work. 2018-02-22 18:31:43.52 Server Error: 17311, Severity: 16, State: 1. 2018-02-22 18:31:43.52 Server SQL Server is terminating because of fatal exception c06d007e. This error may be caused by an unhandled Win32 or C++ exception, or by an access violation encountered during exception handling. Check the SQL error log for any related stack dumps or messages. This exception forces SQL Server to shutdown. To recover from this error, restart the server (unless SQLAgent is configured to auto restart). 2018-02-22 18:31:43.53 Server Using 'dbghelp.dll' version '4.0.5' 2018-02-22 18:31:43.53 Server **Dump thread - spid = 0, EC = 0x0000000000000000 2018-02-22 18:31:43.53 Server ***Stack Dump being sent to E:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\LOG\SQLDump0094.txt 2018-02-22 18:31:43.53 Server * ******************************************************************************* 2018-02-22 18:31:43.53 Server * 2018-02-22 18:31:43.53 Server * BEGIN STACK DUMP: 2018-02-22 18:31:43.53 Server * 02/22/18 18:31:43 spid 11352 2018-02-22 18:31:43.53 Server * 2018-02-22 18:31:43.53 Server * ex_handle_except encountered exception C06D007E - Server terminating Issue analysis and troubleshooting thinking ======== In the live meeting with customer, we first checking the setup log. And we saw that the SP4 has been applied, but after a while there is another setup folder generated reporting the failed error message: Exception type: Microsoft.SqlServer.Chainer.Infrastructure.ChainerInvalidOperationException Message: The NT service 'SQLSERVERAGENT' could not be started. HResult : 0x84b20001 FacilityCode : 1202 (4b2) ErrorCode : 1 (0001) Data: HelpLink.EvtType = 0xAE7B8BF3@0xF762992B@1202@1 DisableWatson = true Stack: at Microsoft.SqlServer.Configuration.SetupExtension.RunPatchAllInstanceAction.RestoreServiceStates() at Microsoft.SqlServer.Configuration.SetupExtension.RunPatchAllInstanceAction.ExecuteAction(String actionId) at Microsoft.SqlServer.Chainer.Infrastructure.Action.Execute(String actionId, TextWriter errorStream) at Microsoft.SqlServer.Setup.Chainer.Workflow.ActionInvocation.ExecuteActionHelper(TextWriter statusStream, ISequencedAction actionToRun, ServiceContainer context) Then we get the dump file from customer and performed a detailed analysis . This callstack seems that we have issue while Initialize Server Components(InitServerComponents), and then we get the detailed component name “SqlServerSpatial110.dll” # Child-SP RetAddr Call Site 00 00000000`01bf9cf8 000007fe`fd3810ac ntdll!ZwWaitForSingleObject+0xa 01 00000000`01bf9d00 00000000`ff6af3ee KERNELBASE!WaitForSingleObjectEx+0x79 02 00000000`01bf9da0 00000000`ff6af874 sqlservr!CDmpDump::DumpInternal+0x20e 03 00000000`01bf9e40 00000000`ff6af919 sqlservr!CDmpDump::DumpFilter+0x64 04 00000000`01bf9f10 00000000`74d712e3 sqlservr!`CDmpDump::Dump'::`1'::filt$0+0x19 05 00000000`01bf9f40 00000000`7769bf9d msvcr100!__C_specific_handler+0x97 06 00000000`01bf9fb0 00000000`776704ca ntdll!RtlpExecuteHandlerForException+0xd 07 00000000`01bf9fe0 00000000`77673c1f ntdll!RtlDispatchException+0x45a 08 00000000`01bfa6c0 000007fe`fd39a06d ntdll!RtlRaiseException+0x22f 09 00000000`01bfb070 00000000`ff6af8ec KERNELBASE!RaiseException+0x39 0a 00000000`01bfb140 000007fe`cd6bd4c4 sqlservr!CDmpDump::Dump+0x4c 0b 00000000`01bfb180 000007fe`cd6bdf06 sqllang!SQLDumperLibraryInvoke+0x2e4 0c 00000000`01bfb420 000007fe`cd684b6b sqllang!CImageHelper::DoMiniDump+0x426 0d 00000000`01bfb610 000007fe`cd683f6a sqllang!stackTrace+0xbdb 0e 00000000`01bfd050 000007fe`e0312a75 sqllang!stackTraceCallBack+0xca 0f 00000000`01bfd090 00000000`775abc20 sqldk!ex_handle_except+0x125 10 00000000`01bfd2e0 00000000`776a9035 kernel32!UnhandledExceptionFilter+0x160 11 00000000`01bfd3c0 00000000`77687398 ntdll!RtlUserThreadStart$filt$0+0x45 12 00000000`01bfd3f0 00000000`7769bf9d ntdll!__C_specific_handler+0x8c 13 00000000`01bfd460 00000000`776704ca ntdll!RtlpExecuteHandlerForException+0xd 14 00000000`01bfd490 00000000`77673c1f ntdll!RtlDispatchException+0x45a 15 00000000`01bfdb70 000007fe`fd39a06d ntdll!RtlRaiseException+0x22f 16 00000000`01bfe520 000007fe`ccb7bc0b KERNELBASE!RaiseException+0x39 17 00000000`01bfe5f0 000007fe`cdb8e40f sqllang!__delayLoadHelper2+0x199 18 00000000`01bfe6a0 000007fe`cdb49a75 sqllang!_tailMerge_SqlServerSpatial110_dll+0x3f 19 (Inline Function) --------`-------- sqllang!InitializeSpatialNativeImpl+0x26 1a 00000000`01bfe710 00000000`ff69af2a sqllang!SQLLangInitComponents+0x275 1b 00000000`01bfe770 00000000`ff69c094 sqlservr!InitServerComponents+0x8a 1c 00000000`01bfe7d0 00000000`ff693c48 sqlservr!sqlservr_main+0x3b4 1d 00000000`01bfea20 000007fe`fe32a82d sqlservr!sqlservr_main_thread+0x178 1e 00000000`01bffd30 00000000`775259cd sechost!ScSvcctrlThreadA+0x25 1f 00000000`01bffd60 00000000`7768383d kernel32!BaseThreadInitThunk+0xd 20 00000000`01bffd90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d Looked at the DLL being loaded szDll : 0x7fecdeef9c0 : "SqlServerSpatial110.dll" [Type: char *] Then we checked the SqlServerSpatial110.dll both in C:\Windows\System32 and C:\Windows\SystemWOW64. System32 WOW64: And then we checked this SqlServerSpatial110.dll properties in another working server as below. Then we got the root cause that the current server SqlServerSpatial110.dll original filename is SqlServerSpatial120.dll(it is SQL 2014). But in this case, somehow is has been changed, but we don’t know why, we only get to the result. Then we tried to copy the correct one from another server to current server(system32 and systemWOW64 respectively). But this doesn’t work. After we restart the SQL Server it failed again, with same error. Then we come up with another solution, we can rename the current SqlServerSpatial110.dll to SqlServerSpatial110.dll.old and then launch the repair again. This worked. After repairing, we can see the new SqlServerSpatial110.dll has been created automatically with correct SP3 version(same with the current SQL Server version) Root cause: ======== Somehow, SqlServerSpatial110.dll has been changed. Resolution: ======== Rename the SqlServerSpatial110.dll under C:\Windows\System32 and C:\Windows\SystemWOW64 Launch the repair for SQL Server 2012. This resolved this issue successfully. Case 2, there is no dump generated Issue definition: ======== After you patch the SQL Server 2016, 9 instances failed to start due to the following error. 2020-02-22 18:43:17.14 Server Error: 17311, Severity: 16, State: 1. 2020-02-22 18:43:17.14 Server SQL Server is terminating because of fatal exception c06d007e. This error may be caused by an unhandled Win32 or C++ exception, or by an access violation encountered during exception handling. Check the SQL error log for any related stack dumps or messages. This exception forces SQL Server to shutdown. To recover from this error, restart the server (unless SQLAgent is configured to auto restart). 2020-02-22 18:43:17.14 Server Using 'dbghelp.dll' version '4.0.5' 2020-02-22 18:43:17.14 Server **Dump thread - spid = 0, EC = 0x0000000000000000 2020-02-22 18:43:17.14 Server * ******************************************************************************* 2020-02-22 18:43:17.14 Server * 2020-02-22 18:43:17.14 Server * BEGIN STACK DUMP: 2020-02-22 18:43:17.14 Server * 02/22/20 18:43:17 spid 8832 2020-02-22 18:43:17.14 Server * 2020-02-22 18:43:17.14 Server * ex_handle_except encountered exception C06D007E - Server terminating 2020-02-22 18:43:17.22 Server Stack Signature for the dump is 0x00000001A124A749 2020-02-22 18:43:17.22 Server Unable to create dump because SQLDUMPER library is not available. 2020-02-22 18:43:17.22 Server SQL Server shutdown has been initiated Issue analysis: ======== Based on the log, this issue should happened when loading some DLL. But as there no dump generated due to Unable to create dump because SQLDUMPER library is not available(from the following analysis, we can also see that dbgheko.dll not loaded yet). So we captured a process monitor, and then analyze the process monitor log. From the process monitor log, we can’t get the callstack which can help us to identify the DLL. So we are not able to find the DLL from the callstack, but we can compare the DLL which has already been loaded with a working server, and find out all the non-loaded DLL, and maybe we can find more clue. Compared the working server loaded DLL and this problematic server. All the yellow-mark DLL are non-loaded DLL in the problematic SQL. Working Server: <<<<<<<<<<<<<<< SQLNCLIRDAR11.RLL 0x18261ee0000 0x38000 c:\Program Files\Microsoft SQL Server\140\Shared\1033\SQLNCLIRDAR11.RLL Microsoft Corporation 2011.0110.5069.066 ((SQL11_OLEDB_AAD).160321-0812 ) 3/21/2016 11:56:15 PM instapi140.dll 0x7ffc21c80000 0x13000 c:\Program Files\Microsoft SQL Server\140\Shared\instapi140.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 6:36:59 AM sqlnclirda11.dll 0x59e10000 0x359000 c:\Program Files\Microsoft SQL Server\140\Shared\sqlnclirda11.dll Microsoft Corporation 2011.0110.5069.066 ((SQL11_OLEDB_AAD).160321-0812 ) 3/21/2016 11:57:40 PM BatchParser.dll 0x7ffbf7990000 0x2c000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\BatchParser.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 6:36:47 AM dbghelp.dll 0x59c70000 0x196000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\dbghelp.dll Microsoft Corporation 6.12.0002.633 (debuggers(dbg).100201-1211) 2/2/2010 4:15:44 AM ftimport.dll 0x60000000 0x25000 c:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\ftimport.dll Microsoft Corporation 12.0.7031.9999 11/1/2006 12:27:20 PM hkcompile.dll 0x7ffbf86f0000 0x162000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\hkcompile.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 6:47:36 AM hkengine.dll 0x7ffbf7fe0000 0x70b000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\hkengine.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 6:51:33 AM hkruntime.dll 0x7ffbf8860000 0x2d1000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\hkruntime.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 6:52:55 AM MSFTE.DLL 0x49980000 0x3ae000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\MSFTE.DLL Microsoft Corporation 12.0.6828.0 8/28/2006 7:15:55 PM opends60.dll 0x7ffbf8cb0000 0x9000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\opends60.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 6:36:08 AM qds.dll 0x7ffbf8b80000 0x12e000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\qds.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 9:38:39 AM sqlevn70.rll 0x17cc64a0000 0x310000 c:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\Resources\1033\sqlevn70.rll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 6:36:13 AM xplog70.RLL 0x183ea3d0000 0x4000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\Resources\1033\xplog70.RLL Microsoft Corporation 2017.0140.1000.169 ((SQLServer).170822-2340) 8/23/2017 7:50:34 AM xpstar.rll 0x183e93a0000 0xd000 c:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\Resources\1033\xpstar.rll Microsoft Corporation 2017.0140.1000.169 ((SQLServer).170822-2340) 8/23/2017 7:50:29 AM secforwarder.dll 0x7ffbf8b40000 0xa000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\secforwarder.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 6:45:38 AM SqlAccess.dll 0x7ffbf7f60000 0x75000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\SqlAccess.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 6:37:39 AM sqldk.dll 0x7ffbf8cc0000 0x4fc000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\sqldk.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 9:35:28 AM sqllang.dll 0x7ffbf9af0000 0x2782000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\sqllang.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 9:44:02 AM sqlmin.dll 0x7ffbfc280000 0x31fe000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\sqlmin.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 9:38:05 AM SQLOS.dll 0x7ffbff480000 0x7000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\SQLOS.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 6:45:34 AM SQLSCM.dll 0x7ffbf7e30000 0x13000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\SQLSCM.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 6:40:48 AM sqlservr.exe 0x7ff7869e0000 0x7f000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\sqlservr.exe Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 9:35:24 AM sqlTsEs.dll 0x7ffbf91c0000 0x92f000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\sqlTsEs.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 9:35:33 AM svl.dll 0x7ffbf8b50000 0x2e000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\svl.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 6:57:13 AM xplog70.dll 0x7ffc2ea50000 0x14000 c:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\xplog70.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 6:55:14 AM xpsqlbot.dll 0x7ffbf7cd0000 0x8000 C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\xpsqlbot.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 6:38:05 AM xpstar.dll 0x7ffbf7c60000 0x6e000 c:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\xpstar.dll Microsoft Corporation 2017.0140.3238.01 ((SQLServer2017-CU14).190913-2228) 9/14/2019 6:47:53 AM Non_working Server <<<<<<<<<<<<<<< HKRUNTIME.DLL 0x7fffea270000 0x2d0000 \Device\CSVVolume137\MSSQL13.KCOM1S130EDAM01\MSSQL\BINN\HKRUNTIME.DLL 1/1/1970 8:00:00 AM OPENDS60.DLL 0x7ffffe550000 0x9000 \Device\CSVVolume137\MSSQL13.KCOM1S130EDAM01\MSSQL\BINN\OPENDS60.DLL 1/1/1970 8:00:00 AM QDS.DLL 0x7fffeac80000 0xdc000 \Device\CSVVolume137\MSSQL13.KCOM1S130EDAM01\MSSQL\BINN\QDS.DLL 1/1/1970 8:00:00 AM SQLEVN70.RLL 0x7fffe5180000 0x2f0000 \Device\CSVVolume137\MSSQL13.KCOM1S130EDAM01\MSSQL\BINN\RESOURCES\1033\SQLEVN70.RLL 1/1/1970 8:00:00 AM SQLBOOT.DLL 0x7fffeb230000 0x2e000 \Device\CSVVolume137\MSSQL13.KCOM1S130EDAM01\MSSQL\BINN\SQLBOOT.DLL 1/1/1970 8:00:00 AM SQLDK.DLL 0x7fffe97b0000 0x4c8000 \Device\CSVVolume137\MSSQL13.KCOM1S130EDAM01\MSSQL\BINN\SQLDK.DLL 1/1/1970 8:00:00 AM SQLLANG.DLL 0x7fffddf90000 0x25c7000 \Device\CSVVolume137\MSSQL13.KCOM1S130EDAM01\MSSQL\BINN\SQLLANG.DLL 1/1/1970 8:00:00 AM SQLMIN.DLL 0x7fffe0560000 0x24fa000 \Device\CSVVolume137\MSSQL13.KCOM1S130EDAM01\MSSQL\BINN\SQLMIN.DLL 1/1/1970 8:00:00 AM SQLOS.DLL 0x7fffff390000 0x7000 \Device\CSVVolume137\MSSQL13.KCOM1S130EDAM01\MSSQL\BINN\SQLOS.DLL 1/1/1970 8:00:00 AM SQLSERVR.EXE 0x7ff7c29d0000 0x68000 \Device\CSVVolume137\MSSQL13.KCOM1S130EDAM01\MSSQL\BINN\SQLSERVR.EXE 1/1/1970 8:00:00 AM SQLTSES.DLL 0x7fffe5470000 0x88b000 \Device\CSVVolume137\MSSQL13.KCOM1S130EDAM01\MSSQL\BINN\SQLTSES.DLL 1/1/1970 8:00:00 AM We can see HKRUNTIME already loaded in our problematic SQL, so the next most possible DLL should be the HKEngine.dll HKCompile.dll. Then check them in your environment, they do have some problem in those DLLs, for example the size/the modified date. Then we rename those DLL, for example HKEngine.dll.Old, and then run repair SQL Server, and after that we confirmed 1 instance start successfully. Solution: ======== HKEngine.dll HKCompile.dll has some problem when SQL Server trying to load them. So rename them and then run SQL Repair, and new DLL will be generated, then SQL Server start successfully.21KViews1like1Comment101 of Troubleshooting SQL Server on Linux
Based on our extensive experience with customers using SQL on Linux, we have compiled a guide outlining fundamental troubleshooting steps and available tools to assist in resolving SQL on Linux issues. This guide aims to make it easier for SQL DBAs who have primarily worked on Windows operating systems over the years. In this article, we will normalize three checks in Linux that we typically perform in Windows to start troubleshooting issues with SQL Server: System logs SQL Server logs Task Manager Note: The screenshots in the below examples are taken from RHEL and Ubuntu machines, and while the Linux flavors are different, the commands are similar in both environments. System logs: It contains the log messages that the system processes and applications, and these messages are written as plain text log files. The rsyslog service keeps various log files in the /var/log directory. You can open these files using native commands such as tail , head , more , less , cat , and so forth, depending on what you are looking for. In RHEL and SLES, they are called messages, while in Ubuntu, you would find them as syslog. RHEL and SLES: The primary system log file is located at /var/log/messages . Ubuntu: The main system log file is /var/log/syslog . Log files and journals are crucial to a system administrator's work. They reveal a great deal of information about a system and are instrumental during troubleshooting and auditing. For example, in RHEL, to display boot and other kernel messages, view /var/log/messages : [server]$ cat /var/log/messages Use grep and other filtering tools to gather more specific events from a file. You can also use tail to view files as they are updated. [server]$ tail -f /var/log/messages Check the /var/log/secure file to view users and their activities: [server]$ tail -f /var/log/secure Similarly in Ubuntu, we have /var/log/syslog What does the syslog or messages file contain? how do we read them? These log files contain events and messages generated by the kernel, applications, and users that log into the system. The logs are written in the below format. The timestamp indicates the time when a log entry was created in the format MMM dd HH:mm:ss. Notice that this format does not include a year. Hostname is the host or system that originally create the message. Application is the application that created the message. Message contains the actual details of an event. Since the message/syslog could contain thousands/millions of lines as log entries, using 'cat' command alone might not be the right choice at times. I suggest using grep for filtering text through the files. Using tail -f command allows you to read the current log file in real time. You may combine it with 'grep' to filter on desired text. An alternate method to validate the system events is via the journald Systemd-journald : journald is a component of systemd responsible for handling logging. It captures logs, records them, and makes them easy to find. Unlike traditional syslog implementations, journald offers features like structured logging, indexing for fast search, access control, and signed messages. The systemd-journald service does not keep separate files, as rsyslog does. The idea is to avoid checking different files for issues. Systemd-journald saves the events and messages in a binary format that cannot be read with a text editor. You can query the journal with the journalctl command. journald stores logs in memory (RAM) without persistent storage (by default), while the Traditional syslog (e.g., /var/log/messages) persists log data to flat files. To show all event messages, use: [server]$ journalctl To view journal entries for today, use: [server]$ journalctl --since today To check for messages related to the sql server service for the past hour, you can run: journalctl --unit mssql-server.service --since "1 hour ago" SQL Server Error logs: The error log contains informational messages, warnings, and information about critical events. The error log also contains information about user-generated messages and auditing information such as logon events (success and failure).In Linux the default SQL Errorlog location is /var/opt/mssql/log (Keep in mind that you will require superuser permissions to traverse these folders and logs.) This is also the default location for XEvents, dump files and trace files. Comparing Error and System Logs Output You can use both the SQL Server error log and the system logs to identify the cause of problems. For example, while monitoring the SQL Server error log, you may encounter error messages that do not contain cause information. By comparing the dates and times for events between these logs, you can narrow the list of probable causes. Consider the following scenario: We receive an alert indicating that the application is unable to connect to SQL following a maintenance activity over the weekend. We log into the server to verify if SQL Server is operational. Upon inspection, we find that the SQL process is not running. We then check the status and attempt to start the SQL Server to see if it comes online. On restarting SQL Server, we observe the following information in the error logs. Next, we need to examine the system event logs, specifically the messages file in RHEL & SLES or the syslog file in Ubuntu, where events are recorded.We can use the following command to retrieve service logs from the recent boot: journalctl --unit mssql-server.service --boot --no-pager By validating the messages or syslog file within the same timeframe, we gather additional information. From the logs, we identify a permission issue on a specific folder or file that is preventing SQL Server from starting. Granting the necessary permissions should resolve the issue. Task Manager: Where is the Task Manager in Linux? There are command-line utilities that provide similar information to what we see in Windows. We will explore the top and htop utilities top The first line of numbers on the dashboard includes the time, how long your computer has been running, the number of people logged in, and what the load average has been for the past one, five, and 15 minutes. The second line shows the number of tasks and their states: running, stopped, sleeping, or zombie.The next 3 lines describe CPU, Memory utilization, Swap Memory of the server. The column details of the process are tabulated as below. To get into the details of a particular process, (In our case SQL Server) we can use the PID to get the further details of the SQL and the tasks that SQL is currently running. To get the child PID of SQLServer, use this short command and run the top command on the pid. pidof sqlservr | cut -d' ' -f1 top -p (pid output of the above command) We can also get them into single command as shown below, top -p $(pidof sqlservr | cut -d' ' -f1) We see the utilization of various tasks within SQL Server, that are consuming the resources. Another utility that provides similar information is htop . The key difference is that htop offers a more user-friendly experience with its use of colors and graphs, compared to the top command. While top and htop provide valuable information, there are additional command-line tools such as vmstat , System Activity Information ( sar ), iostat , and others. I suggest running these commands on your test machines and monitoring the output to become familiar with checking performance metrics on a Linux server. References: SLES: https://documentation.suse.com/sles/12-SP5/html/SLES-all/cha-util.html# RHEL:https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/performance_tuning_guide/sect-red_hat_enterprise_linux-performance_tuning_guide-performance_monitoring_tools-built_in_command_line_tools Ubuntu: https://manpages.ubuntu.com/manpages/jammy/man1/top.1.html#2.%20summary%20display I hope this serves as a helpful introduction to basic troubleshooting for SQL on Linux.3.1KViews1like0CommentsOS Hang or Out of Memory due to SQL Ser... No Wait, it's SQL Analysis Services (SSAS)
First published on MSDN on Jan 12, 2018 Recently, we have observed a number of cases where DBAs or application developers are complaining about out-of-memory errors or even machine not responding (hangs) despite the fact that there is plenty of available memory on the system.9.9KViews5likes1CommentTroubleshooting REDO queue build-up (data latency issues) on AlwaysOn Readable Secondary Replicas using the WAIT_INFO Extended Event
First published on MSDN on Jan 06, 2015 PROBLEM You have confirmed excessive build up of REDO queue on an AlwaysOn Availability Group secondary replica by one of the following methods: Querying sys.53KViews1like5Comments