question

Marcus-9726 avatar image
0 Votes"
Marcus-9726 asked Criszhan-msft commented

SQL Failover Cluster Instance 2012 with Windows Server 2008 SP2 Failover & Failback took about 3 minutes plus to bring resource online

Hi Guys,

I recently just setup a SQL Server 2012 Failover Cluster Instance on Windows Server 2008 SP2. When I do a failover test, it took about 3 minutes plus to completely failover to second node. Meanwhile, failback took the same time as well. I used to setup SQL Cluster in Windows Server 2012, 2016 and 2019, all just took about 10 seconds to complete the failover/failback. Don't understand why is this cluster on Windows Server 2008 took that long?

From cluster event I can only see a warning about the DNS server failure & that operation returned timeout period expired. The DNS server status of the cluster resource is OK and I can see it successfully registered the host record in DNS server.

Any idea? Is it due to OS version or above error logs causing this issue?

Thank you.

sql-server-generalwindows-server
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

JeffreyWilliams-3310 avatar image
0 Votes"
JeffreyWilliams-3310 answered Marcus-9726 commented

How are you determining that it took 3 minutes to complete? If I recall - Windows Server 2008 will take time to move the disk volumes, and SQL takes time to recover the databases. If the transaction logs have a lot of data to be recovered - that can take a long time.

So it really depends on what you are measuring to determine when the system has completed the failover.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Jeffrey,

As I mentioned this is a new setup of SQL FCI so there's no database in it and it is a fresh new cluster instance. When I do the failover, all other resources went offline and came back online just quick. Only the Cluster Virtual Network Name resource took a long time to come back online.

0 Votes 0 ·
Criszhan-msft avatar image
0 Votes"
Criszhan-msft answered Criszhan-msft commented

Hi @MarcusWong-9726,

From cluster event I can only see a warning about the DNS server failure & that operation returned timeout period expired. The DNS server status of the cluster resource is OK and I can see it successfully registered the host record in DNS server.

Would you please show the detailed messages. Usually we can get some information from the Windows cluster log, SQL Server errorlog, and Windows event log.

Is it due to OS version or above error logs causing this issue?

This seems to be more related to some problems if failover needs 3 minutes.
It is also recommended to install the latest updates for SQL Server 2012 failover cluster instances to avoid any problems that have been fixed.

· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Criszhan,

From the cluster event logs there are 2 repeating logs which is as below:

-DNS server failure.
-Failed to register DNS. This operation returned timeout period expired.

The above error logs seems to appear due to the ghosted network adapters in the server, I have modified the registry key to change the binding order but still when I do the failover/failback the same issue occur. In cluster validation there's no significant error/failure. For detailed event logs I will try to take a screenshot and post to you soon.

For the SQL updates, let me check if is it already the latest one, will update you as soon as possible.

Thank you.

0 Votes 0 ·

Hi Criszhan,

I have checked on the cluster event log and here's the only errors that I can see from the servers. For these errors, I have changed the network binding order thru registry and rebooted both nodes, but still these errors are popping out. And may I know if these related to the issue of time taken for failover/failback?
85516-image.png


85534-image.png


0 Votes 0 ·
image.png (86.2 KiB)
image.png (166.6 KiB)

Hi,
After you performed a manual failover of the FCI cluster, only the cluster network name resource took a long time to come back online, and you can see related errors in the event log. There is reason to suspect that these errors are related. I will try to research this issue and I will reply as soon as possible if there is an update.

0 Votes 0 ·
Show more comments