SQL SERVER – AlwaysOn Availability Group Stuck in Resolving State For Long time

This is one of the common issues which I heard from my clients using SQL Server AlwaysOn and they don’t know why it happens.

SQL SERVER - AlwaysOn Availability Group Stuck in Resolving State For Long time 108-1

What is resolving state in SQL Server AlwaysOn?

When there is an availability group, the replica would be either in primary state or secondary state – when its online in failover cluster manager. Resolving is an intermediate state when the transition is happening from primary to secondary or vice versa. If due to some reason the transition is not successful, it goes to “resolving” state. In this state, the database is not accessible.

What you should do to find the cause?

I always ask for ERRORLOG in such situations to know the series of events. Where is ERRORLOG? Various Ways to Find its Location

2016-02-22 17:36:14.75 Server      ***Unable to get thread context for spid 0
2016-02-22 17:36:14.75 Server      * *******************************************************************************
2016-02-22 17:36:14.75 Server      *
2016-02-22 17:36:14.75 Server      * BEGIN STACK DUMP:
2016-02-22 17:36:14.75 Server      *   04/22/15 17:36:14 spid 7416
2016-02-22 17:36:14.75 Server      *
2016-02-22 17:36:14.75 Server      * Non-yielding Scheduler
2016-02-22 17:36:14.75 Server      *
2016-02-22 17:36:14.75 Server      * *******************************************************************************
2016-02-22 17:36:14.76 Server      Stack Signature for the dump is 0x000000000000025C
……
2016-02-22 17:36:26.19 Server      Error: 19407, Severity: 16, State: 1.
2016-02-22 17:36:26.19 Server      The lease between availability group ‘ProdAG’ and the Windows Server Failover Cluster has expired. A connectivity issue occurred between the instance of SQL Server and the Windows Server Failover Cluster. To determine whether the availability group is failing over correctly, check the corresponding availability group resource in the Windows Server Failover Cluster.
2016-02-22 17:36:26.20 Server      AlwaysOn: The local replica of availability group ‘ProdAG’ is going offline because either the lease expired or lease renewal failed. This is an informational message only. No user action is required.

When I dug further I found that non-yielding came due to a restore command. The restore contributed to the IO load on the server, causing a non-yielding scheduler to occur within the SQL Server.  This kept the SQL Server from renewing the lease with the cluster, causing the lease timeout for the availability group, which then caused the availability to go into a resolving state

Along with Errorlog, it’s always good to have Windows event logs (Application and System) and Cluster log from all participating nodes.

Have you ever faced such situation where the availability group was in resolving error?

If you like this blog post, here is another blog post where we have discussed how you can add files to the database in the AlwaysOn Availability Group.

Reference: Pinal Dave (https://darkslategrey-bat-805937.hostingersite.com)

AlwaysOn, SQL Error Messages, SQL Server
Previous Post
SQL SERVER – 2016 – Check Value as JSON With ISJSON()
Next Post
SQL SERVER – Unable to start SQL Server Agent – Failed to Initialize SQL Agent log

Related Posts

Leave a Reply