Cannot join node to AZureStack HCI cluster

RJ Riemensnider 1 Reputation point
2020-07-30T22:51:41.187+00:00

Se are trying to re-add a node we evicted from a Server 2019/AzureStack HCI cluster and cannot re-add it.

I keep coming back to crypto. In a successful cluster join, you see something like this:

00002494.0000171c::2020/07/28-20:26:01.895 RetrieveHostLabel completed with status = 0
00002494.0000171c::2020/07/28-20:26:03.144 GenerateClusterCert for cert type 0 completed with status = 0
00002494.0000171c::2020/07/28-20:26:03.888 GenerateClusterCert for cert type 2 completed with status = 0
00002494.0000171c::2020/07/28-20:26:03.982 StoreClusterSecret completed with status = 0
00002494.0000171c::2020/07/28-20:26:03.999 StoreClusterCert for cert type 0 completed with status = 0
00002494.0000171c::2020/07/28-20:26:04.015 StoreClusterCert for cert type 2 completed with status = 0

We are seeing this on the system attempting to join.

00002730.00001028::2020/07/28-19:18:41.652 RetrieveHostLabel completed with status = 0

00004d68.00005920::2020/07/23-21:19:43.788 RetrieveServiceSecret completed with status = 0
00004d68.00005920::2020/07/23-21:19:43.795 RetrieveClusterCert for cert type 0 completed with status = 0
00004d68.00005920::2020/07/23-21:19:43.802 RetrieveClusterCert for cert type 2 completed with status = -2146893802

The issue appears to be with the cert type 2. I can find no documentation on this though, anyone have any insight?

Windows Server Clustering
Windows Server Clustering
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Clustering: The grouping of multiple servers in a way that allows them to appear to be a single unit to client computers on a network. Clustering is a means of increasing network capacity, providing live backup in case one of the servers fails, and improving data security.
1,013 questions
0 comments No comments
{count} votes

7 answers

Sort by: Most helpful
  1. RJ Riemensnider 1 Reputation point
    2020-07-30T23:05:47.75+00:00

    As more info:

    Return Values: A signed 32-bit value that indicates return status. If the method returns a negative
    value, it has failed. Zero or positive values indicate success, with the lower 16 bits in positive
    nonzero values containing warnings or flags defined in the method implementation. For more
    information about Win32 error codes and HRESULT values, see [MS-ERREF] section 2.1 and
    section 2.2.

    0 comments No comments

  2. Xiaowei He 9,906 Reputation points
    2020-07-31T06:48:11.783+00:00

    Hi,

    00004d68.00005920::2020/07/23-21:19:43.802 RetrieveClusterCert for cert type 2 completed with status = -2146893802

    According to your information, it seems the issue is related to the cluster certificate.
    Generally, the certificate may store in C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys\
    Please check the permission of the folder in each cluster nodes, please ensure "System" and "Local Administrator" Account is given full control:

    14823-sys.png

    If it does still not work, please try to reproduce the issue and check the Cluster log, provide the log file after failing to add the node for analysis.

    Best Regards,
    Anne

    0 comments No comments

  3. RJ Riemensnider 1 Reputation point
    2020-07-31T12:22:27.407+00:00

    Thank you for your reply, I have checked the above directory but it appears in Process monitor that it accesses a keystore in that directory and then returns the successful 'RetrieveClusterCert for cert type 0 completed with status = 0' then it accesses a keystore in C:\ProgramData\Microsoft\Crypto\Keys and C:\Windows\System32\Microsoft\Protect\S-1-5-18 then returns the "RetrieveClusterCert for cert type 2 completed with status = -2146893802".

    In addition I did find this which is the exact code and message I am receiving when trying to do the join. It says to remove System from the permissions. I am wondering at this point If I have a corrupt cert.

    WE have scoured the cluster logs and come up with nothing and have an open case with Microsoft however that haven't been able to provide anything useful.

    0 comments No comments

  4. RJ Riemensnider 1 Reputation point
    2020-07-31T12:48:17.003+00:00

    We do see this in the verbose logs

    [Verbose] 00001c0c.000067dc::2020/07/31-08:32:15.070 ERR cxl::CertStore::IsKeyValid: (-2146893783)' because of 'NCryptExportKey(certKey, 0, BCRYPT_RSAFULLPRIVATE_BLOB, nullptr, nullptr, resultBytes, &resultBytes, 0)'
    [Verbose] 00001c0c.000067dc::2020/07/31-08:32:15.076 ERR cxl::CertStore::IsKeyValid: (-2146893783)' because of 'NCryptExportKey(certKey, 0, BCRYPT_RSAFULLPRIVATE_BLOB, nullptr, nullptr, resultBytes, &resultBytes, 0)'
    [Verbose] 00001c0c.000067dc::2020/07/31-08:32:15.076 ERR cxl::CertStore::IsKeyValid: (-2146893783)' because of 'NCryptExportKey(certKey, 0, BCRYPT_RSAFULLPRIVATE_BLOB, nullptr, nullptr, resultBytes, &resultBytes, 0)'
    [Verbose] 00001c0c.000067dc::2020/07/31-08:32:15.099 ERR cxl::CertStore::IsKeyValid: (-2146893802)' because of 'NCryptOpenKey(certProv, certKey.Reference(), keyProvInfo->pwszContainerName, AT_KEYEXCHANGE, (machineKey ? NCRYPT_MACHINE_KEY_FLAG : 0) | NCRYPT_SILENT_FLAG)'
    [Verbose] 00001c0c.000067dc::2020/07/31-08:32:15.121 ERR cxl::CertStore::IsKeyValid: (-2146893802)' because of 'NCryptOpenKey(certProv, certKey.Reference(), keyProvInfo->pwszContainerName, AT_KEYEXCHANGE, (machineKey ? NCRYPT_MACHINE_KEY_FLAG : 0) | NCRYPT_SILENT_FLAG)'

    0 comments No comments

  5. RJ Riemensnider 1 Reputation point
    2020-07-31T13:52:32.263+00:00

    Any info on how I can display what cert a container file is linked to? This is the last accessed keyfile before the error:

    14913-annotation-2020-07-31-095106.png

    Or, how do you change which cluster member processes the cluster join? Despite moving the core cluster resources to a different node, the same node always tries to process the cluster join operation.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.