Availability Group Database Reports Not Synchronizing / Recovery Pending After Database Log File Inaccessible
You may find that one or more availability group databases is reported ‘Not Synchronizing / Recovery Pending’ on the primary replica or ‘Not Synchronizing’ on one of the secondary replicas. Despite this, your availability group replicas report they are in the primary role or secondary role,
This may occur if SQL Server is unable to access the database log files of your availability group database(s). The following blog describes the symptoms, cause and resolution for this problem when encountered on the primary or secondary replicas of your availability group.
Symptoms of Availability Group Database on Primary Replica
Primary Replica Database Reported Not Synchronizing / Recovery Pending
In SQL Server Management Studio’s Object Explorer, on the primary replica, the availability group database may be reported as ‘Not Synchronizing/Recovery Pending:
Primary Replica - Attempts to Access Database or Perform DML Fail
Attempts to access the affected database will fail. For example, an attempt to INSERT into a table in that database reports 9001, ‘The log for database…is not available.’
SQL Server Management Studio will report the database in ‘Not Synchronizing / Recovery Pending’ state.
Cause SQL Server unable to access database log file
This availability group database state could have been caused because SQL Server could not access the log file and or data file(s). In the event the log file is or was inaccessible, SQL Server will offline the database and report the database in Recovery Pending state. In this scenario, the database is defined in an availability group, and therefore will be reported in SQL Server Management Studio as ‘Not Synchronizing / Recovery Pending. Check the SQL Server error log to determine if this is the cause for reporting the Recovery Pending state. You will find a message 9001 reported that the ‘log for database xxx is not available.’
2017-07-13 06:42:44.74 spid1s Error: 17053, Severity: 16, State: 1.
2017-07-13 06:42:44.74 spid1s SQLServerLogMgr::LogWriter: Operating system error 21(The device is not ready.) encountered.
2017-07-13 06:42:44.74 spid1s Write error during log flush.
2017-07-13 06:42:44.75 spid51 Error: 9001, Severity: 21, State: 4.
2017-07-13 06:42:44.75 spid51 The log for database 'testdb' is not available. Check the event log for related error messages. Resolve any errors and restart the database.
2017-07-13 06:42:44.75 spid51 State information for database 'testdb' - Hardened Lsn: '(34:840:1)' Commit LSN: '(34:824:2)' Commit Time: 'Jul 12 2017 1:19PM'
2017-07-13 06:42:44.75 spid23s Always On Availability Groups connection with secondary database terminated for primary database 'testdb' on the availability replica 'SN2' with Replica ID: {10b9a12b-afed-48f3-b79c-9ecc5cada754}. This is an informational message only. No user action is required.
2017-07-13 06:42:44.75 spid51 Database testdb was shutdown due to error 9001 in routine 'XdesRMFull::CommitInternal'. Restart for non-snapshot databases will be attempted after all connections to the database are aborted.
2017-07-13 06:42:44.77 spid18s State information for database 'testdb' - Hardened Lsn: '(34:840:1)' Commit LSN: '(34:824:2)' Commit Time: 'Jul 12 2017 1:19PM'
2017-07-13 06:42:44.86 spid18s Error: 17053, Severity: 16, State: 1.
2017-07-13 06:42:44.86 spid18s fcb::close-flush: Operating system error (null) encountered.
Resolve Primary Replica Database Not Synchronizing / Recovery Pending
A thought might be to suspend and resume the availability group database but this will not solve this issue. Resume does not initiate recovery on the database, which is what is necessary to bring the database back into an online state.
Once you have confirmed the database files are accessible, use the ALTER DATABASE SET ONLINE command to initiate the recovery of the database which should bring the database online.
use master
go
alter database testdb set online
go
Here we online the database and now the database is once again online and in a Synchronized state.
The error log shows SQL Server go through recovery and re-establish connection with secondary replica.
2017-07-13 06:57:41.97 spid58 Setting database option ONLINE to ON for database 'testdb'.
2017-07-13 06:57:41.97 spid58 State information for database 'testdb' - Hardened Lsn: '(34:856:1)' Commit LSN: '(34:824:2)' Commit Time: 'Jul 12 2017 1:19PM
2017-07-13 06:57:41.97 spid58 State information for database 'testdb' - Hardened Lsn: '(34:856:1)' Commit LSN: '(34:824:2)' Commit Time: 'Jul 12 2017 1:19PM'
2017-07-13 06:57:42.00 spid58 State information for database 'testdb' - Hardened Lsn: '(34:856:1)' Commit LSN: '(34:824:2)' Commit Time: 'Jul 12 2017 1:19PM'
2017-07-13 06:57:42.00 spid58 Starting up database 'testdb'.
2017-07-13 06:57:42.17 spid58 2 transactions rolled forward in database 'testdb' (9:0). This is an informational message only. No user action is required.
2017-07-13 06:57:42.27 spid30s 0 transactions rolled back in database 'testdb' (9:0). This is an informational message only. No user action is required.
2017-07-13 06:57:46.98 spid36s Always On Availability Groups connection with secondary database established for primary database 'testdb' on the availability replica 'SN2' with Replica ID: {10b9a12b-afed-48f3-b79c-9ecc5cada754}. This is an informational message only. No user action is required.
Symptoms of Availability Group Database on Secondary Replica
Secondary Replica Database Reported Not Synchronizing
If SQL Server hosting the secondary replica is unable to access availability group database files, database is marked in the Not Synchronizing state. Note that sys.dm_hadr_database_replica_states and sys.databases still reports the database as Online.
Secondary Replica – Attempts to Suspend and Resume synchronization Fail
This might be your first instinct, when seeing an availability group database reported as ‘Not Synchronizing’ in SSMS. Even if the database log file becomes accessible again, synchronization is not resumed. Attempting to Suspend and then Resume will both appear to succeed:
However, querying sys.dm_hadr_database_replica_states will report that the database still is not synchronizing and the database is still reported Not Synchronizing in SSMS Object Explorer.
Cause SQL Server unable to access database log file
Again, confirm if the current availability group issue resulted when SQL Server was unable to access the database files. The error log on the secondary reports the problem during redo (term for availability group ongoing recovery of secondary availability group database) accessing log file and so the database is suspended.
2017-07-13 07:07:20.01 spid1s Error: 17053, Severity: 16, State: 1.
2017-07-13 07:07:20.01 spid1s SQLServerLogMgr::LogWriter: Operating system error 21(The device is not ready.) encountered.
2017-07-13 07:07:20.01 spid1s Write error during log flush.
2017-07-13 07:07:20.01 spid25s Error: 9001, Severity: 21, State: 5.
2017-07-13 07:07:20.01 spid25s The log for database 'testdb' is not available. Check the event log for related error messages. Resolve any errors and restart the database.
2017-07-13 07:07:20.19 spid25s Error: 3313, Severity: 21, State: 2.
2017-07-13 07:07:20.19 spid25s During redoing of a logged operation in database 'testdb', an error occurred at log record ID (34:896:1). Typically, the specific failure is previously logged as an error in the Windows Event Log service. Restore the database from a full backup, or repair the database.
2017-07-13 07:07:20.38 spid25s Always On Availability Groups data movement for database 'testdb' has been suspended for the following reason: "system" (Source ID 2; Source string: 'SUSPEND_FROM_REDO'). To resume data movement on the database, you will need to resume the database manually. For information about how to resume an availability database, see SQL Server Books Online.
Resolve Secondary Replica Database Not Synchronizing
You cannot ONLINE a secondary replica availability group database Since a secondary replica availability group database is really in an ongoing restoring state, it cannot be set Online like the primary replica. Remember it is already in the Online role, according to the DMVs.
Solution Restart the SQL Server instance hosting the secondary replica. This will initiate recovery on all databases including availability group databases, and so long as SQL Server can access all the database files, the availability group database affected should recover successfully and resume synchronization with the primary replica.
Comments
- Anonymous
March 11, 2018
Great !! Thanks for sharing....will be life saver in case of big size databases.