Certificate issues connecting to a new service fabric cluster from DevOps

Mark Middlemist 166 Reputation points
2020-11-12T12:05:33.223+00:00

We have just rebuilt our development fabric cluster and we are having trouble deploying to it from Azure DevOps

As with out production cluster (which has been running for a couple of years now) we have created self-signed certificates in KeyVault for use as the primary and secondary management certificates. In our most recent attempts we have set the subject CN to be the FQDN for the cluster as suggested by some people who had experienced similar issues.

We have set up the service connection in DevOps, ensuring there are no copy errors either in the server certificate thumbprint or client Base64 representation (this is using a downloaded pfx of the management certificate)

When attempting to deploy the logs show:

2020-11-12T10:57:01.1927443Z Imported cluster client certificate with thumbprint '***'.
2020-11-12T10:57:07.0396055Z ##[warning]Failed to contact Naming Service. Attempting to contact Failover Manager Service...
2020-11-12T10:57:07.1341385Z ##[warning]Failed to contact Failover Manager Service, Attempting to contact FMM...
2020-11-12T10:57:07.3667857Z Service fabric SDK version: 4.1.458.9590.
2020-11-12T10:57:07.4850147Z ##[error]FABRIC_E_SERVER_AUTHENTICATION_FAILED: 0x800b0109

To validate the certificates I have tried connecting from Powershell using the Connect-ServiceFabricCluster command, and have found that this only works if we include the -SkipChecks $true option

Given that our production cluster is currently using the same scenario (i.e. using a self-signed cert) I'm confused as to what I've done wrong on this deployment. If anyone could help get this working it would be much appreciated.

We have the additional worry that the management certificate on our production cluster is due to be updated in the next couple of weeks, and obviously we can't afford to be in a position where we can't deploy to production.

Thanks in advance

Mark

Azure Service Fabric
Azure Service Fabric
An Azure service that is used to develop microservices and orchestrate containers on Windows and Linux.
264 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Mark Middlemist 166 Reputation points
    2020-11-23T11:26:13.447+00:00

    Just in case anyone else has the same problem I've worked around this, admittedly skipping a diagnostic step but have got it working by:

    1) Recreating the cluster with a scaleset based on the 1709 image of Windows (rather than the 1803 image that I had used originally).

    2) When I recreated the cluster, rather than using a self-signed certificate I used a purchased SSL certificate, though not actually matching any domain associated with the cluster.

    Though according to documentation and the conversations I've had say neither of these things should impact the situation this has worked (I suspect it's #2 that has made the difference)

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.