Failover Cluster DNS error, event 1257 keeps coming back

James Edmonds 816 Reputation points
2022-02-24T10:54:37.4+00:00

Hi,

I have two failover clusters created, for which I did NOT pre-create the DNS records for the cluster or role names.
The DNS records are created when the cluster/role is brought online.

On one of the clusters, I keep getting event ID 1257, where it fails to register or update the DNS entry for the role running on the cluster.
If I delete the existing record, and restart the role, it creates successfully.

I am trying to understand why, if the cluster creates the record, this error keeps coming back?
What can I do to prevent this from constantly complaining about this, when both cluster nodes have access to that DNS record?

Thanks
James

Windows DHCP
Windows DHCP
Windows: A family of Microsoft operating systems that run across personal computers, tablets, laptops, phones, internet of things devices, self-contained mixed reality headsets, large collaboration screens, and other devices.DHCP: Dynamic Host Configuration Protocol (DHCP). A communications protocol that lets network administrators manage centrally and automate the assignment of Internet Protocol (IP) addresses in an organization's network.
1,039 questions
Windows Server Clustering
Windows Server Clustering
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Clustering: The grouping of multiple servers in a way that allows them to appear to be a single unit to client computers on a network. Clustering is a means of increasing network capacity, providing live backup in case one of the servers fails, and improving data security.
1,003 questions
{count} vote

15 answers

Sort by: Most helpful
  1. Marcus Abarbanel 6 Reputation points
    2022-09-16T20:51:29.517+00:00

    The silence from Microsoft on this is deafening. To think, an enterprise level feature, is basically broken, and has been broken for a while now. I just closed a ticket with Microsoft support regarding this issue and they were of no help at all. The tech did indicate that this is a known issue with Microsoft but could not give me an ETA on a fix/patch for this issue. Also, he refused to produce the 'internal' documentation from Microsoft regarding this 'Known Issue'. The real kicker here is that this issue can be resolved by offlining the cluster, running the Update-ClusterNetworkNameResource (which works sporadically) command via PowerShell, execute a repair on the cluster, and/or sometimes it self-heals. Because of this the Microsoft tech always has an out, as they are technically break/fix and if an issue isn't broken they refuse to help, and tell you to submit a root cause analysis ticket to Microsoft Premier support. My argument is that these issues are SYMPTOMS of the real issue. If Microsoft knows the reason/cause for this issue they need to provide information on what causes this issue, HOW TO FIX IT, and/or a patch or ETA on when a patch will be published to resolve this issue.

    1 person found this answer helpful.
    0 comments No comments

  2. Steve Cahill 11 Reputation points
    2022-11-23T03:20:55.307+00:00

    I logged a case with Microsoft last year and the issue was acknowledged and fixed in Nov 21 KB5007266: "Addresses an issue that prevents Failover Clustering from updating Domain Name Server (DNS) records".

    The issue then reared its head in Feb 2022 but it is intermittent for us depending on which node in which site is rebooted in which order etc. (patching normally). I finally gave in and logged another call last month, and just today we closed it with the "workaround" being the only solution they can provide for now, because supposedly they have not heard of it before! I referred to this chat, that Nov 21 CU etc. but unfortunately both were brushed over with instead new focus efforts done to capture more logs and recreate/repair etc. :(

    Anyway, the fix for us was to change each CNO account to Full Control on the respective CNO DNS records, and to do the same for any related VCO accounts. In some cases, we also did the same on the VCO records (added Full Control for the CNO and VCO objects themselves) which I am not entirely convinced if that was necessary or not.

    As part of "archiving" the case, I asked Microsoft to update their official documentation to reflect this workaround is required to overcome Event 1257 intermittently nutting off when a node might be rebooted.

    1 person found this answer helpful.

  3. AL-TechAdmin 1 Reputation point
    2022-04-24T19:55:21.907+00:00

    James,

    I have the same problem which I have not been able to resolve it yet.

    This is what I noticed. I think we cannot simply delete CNO object which Microsoft recommends, or re-create it manually and grant it appropriate permissions (aka “Allow any authenticated user to update DNS record with the same owner name”).

    On day 1, after manually creates it, the error didn't repeat but actually it didn't go away.
    On day 7, the error 1257 ID of DNS came back.

    The action above only creates the DNS record as STATIC (as opposed to Dynamic) with date/time stamp of last update by the virtual cluster server.

    On another set of cluster servers (different DNS server, different domain), I noticed this CNO is a dynamically created name. Not static.

    I wonder, how do we create this CNO object DNS name dynamically in the first place?


  4. Jack Dobiash 6 Reputation points
    2022-06-03T20:50:21.227+00:00

    Hey all, we are also experiencing this on two of our clusters. I'm pretty sure it's a bug that was introduced by a patch sometime near the start of the year. We have a 3rd cluster which has not had the the issue, but it hasn't been updated in a while. We are running Server 2016 Clusters. I know pretty much exactly WHAT the problem is, but not how to fix it. The issue occurs every time the CNO password gets updated, which occurs around every 21 days (at least on our system). Once the password has been updated by the 'core' node, it then somehow 'forgets' what that new password is, at least when trying to update the DNS registration. The underlaying cluster event logs even indicate that it's basically failing to login to the DNS server when attempting to update the DNS record. If we just move the core resources from one node to another (and even back to the first node), things start working again, until the next time it updates the password on the CNO. The other option is to take the 'Name' resource offline and bring it back online, that also fixes it until the next password change.

    If you want to see when the last time your CNO password was updated, check the Attributes of the actual Cluster Object in AD and look for 'pwdLastSet'. In our case, it's been like clockwork each time the password is updated on both clusters. Within 24 hours of the update it starts to complain again (since it updates DNS once a day).

    I'm hoping may someone else can confirm they are in the same situation? Thanks!


  5. Hutton, Gregory 1 Reputation point
    2022-07-18T14:37:03.593+00:00

    Following as well

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.