SharePoint 2016 Distributed Cache Failed to connect to hosts

Debra Dorn 21 Reputation points
2023-01-12T17:39:55.69+00:00

I patched my Test farm (SP2016 On premise, custom server role/single server farm) with the January 2023 SharePoint 2016 update and ran psconfig.  Psconfig failed on first go with distributed cache and a “path” issue. Two timer jobs consistently started failing every 5 minutes since the update/psconfig:

User Profile Service Application Proxy – Feed Cache Full Repopulation Job

User Profile Service Application Proxy – Feed Cache Repopulation Job

In running commands on the farm, Use-CacheCluster failed stating no cache host was found. Checked the DC and the service was running and the service instance up and operational. So these 2 timer jobs consistently fail.

I removed/rebuilt the distributed cache and same issues. Use-CacheCluster fails to find a host. But if you run get-cachehost, it comes back as UP. Checked permissions on the cachehost and the registry key. Remote Registry is running. Have added entry in HOSTS file and that didn't help.

In the ULS logs, the correlation ID brings up several unexpected errors but they all point to the cache is "probably down" but it is not.

The Execute method of job definition Microsoft.Office.Server.UserProfiles.FeedCacheRepopulationJob (ID 30b5c6b9-ad9d-4f6a-a4ac-2b3340d73989) threw an exception. More information is included below. Unexpected exception in FeedCacheService.IsRepopulationNeeded: Unable to create a DataCache. SPDistributedCache is probably down... (Correlation=7e928ba0-8ffa-c0c7-b936-cd8408c9fc20)

The ULS log shows this and it's the same error that PSCONFIG threw:

Unexpected error while executing ExportCacheClusterConfig with parameters provider: 'SPDistributedCacheClusterProvider' , connectionString: ' (String points to the SQL server).

Unexpected Exception in getting cache cluster security config - Exception 'System.Management.Automation.ParameterBindingException: A parameter cannot be found that matches parameter name 'Path'.

at System.Management.Automation.Runspaces.PipelineBase.Invoke(IEnumerable input)

at Microsoft.SharePoint.DistributedCaching.Utilities.SPVelocityPowerShellWrapper.ExportCacheClusterConfig(String provider, String connectionString, String path)

at Microsoft.Office.Server.DistributedCaching.SPDistributedCachePointerWrapper.InitializeDataCacheFactory()'.

Retrieving all the caches:

PS C:\AdminTools\Powershell\scripts> Use-CacheCluster

$caches = Get-Cache | select cachename

foreach ($cache in $caches)

{$cache.CacheName}

Use-CacheCluster : ErrorCode<ERRCAdmin040>:SubStatus<ES0001>:Failed to connect to hosts in the cluster

At line:1 char:1

  • Use-CacheCluster
    • CategoryInfo : NotSpecified: (:) [Use-CacheCluster], DataCacheException
  • + FullyQualifiedErrorId : Microsoft.ApplicationServer.Caching.DataCacheException,Microsoft.ApplicationServer.Caching.Commands.UseCacheClusterCommand
    

default

DistributedAccessCache_43be6e76-508e-4fde-b6a3-9bb78468fc7c

DistributedActivityFeedCache_43be6e76-508e-4fde-b6a3-9bb78468fc7c

DistributedActivityFeedLMTCache_43be6e76-508e-4fde-b6a3-9bb78468fc7c

DistributedBouncerCache_43be6e76-508e-4fde-b6a3-9bb78468fc7c

DistributedClientSideAppUpdateTimeCache_43be6e76-508e-4fde-b6a3-9bb78468fc7c

DistributedDefaultCache_43be6e76-508e-4fde-b6a3-9bb78468fc7c

DistributedFileLockThrottlerCache_43be6e76-508e-4fde-b6a3-9bb78468fc7c

DistributedHealthScoreCache_43be6e76-508e-4fde-b6a3-9bb78468fc7c

DistributedLogonTokenCache_43be6e76-508e-4fde-b6a3-9bb78468fc7c

DistributedResourceTallyCache_43be6e76-508e-4fde-b6a3-9bb78468fc7c

DistributedSearchCache_43be6e76-508e-4fde-b6a3-9bb78468fc7c

DistributedSecurityTrimmingCache_43be6e76-508e-4fde-b6a3-9bb78468fc7c

DistributedServerToAppServerAccessTokenCache_43be6e76-508e-4fde-b6a3-9bb78468fc7c

DistributedSharedWithUserCache_43be6e76-508e-4fde-b6a3-9bb78468fc7c

DistributedUnifiedGroupsCache_43be6e76-508e-4fde-b6a3-9bb78468fc7c

DistributedViewStateCache_43be6e76-508e-4fde-b6a3-9bb78468fc7c

Use-CacheCluster : ErrorCode<ERRCAdmin040>:SubStatus<ES0001>:Failed to connect to hosts in the cluster

At line:1 char:1

  • Use-CacheCluster
    • CategoryInfo : NotSpecified: (:) [Use-CacheCluster], DataCacheException
    • FullyQualifiedErrorId : Microsoft.ApplicationServer.Caching.DataCacheException,Microsoft.ApplicationServer.Caching.Commands.UseCacheClusterCommand

PS C:\AdminTools\Powershell\scripts> Get-CacheHost

HostName : CachePort Service Name Service Status Version Info


SERVER.DOMAIN.COM:22233 AppFabricCachingService UP 0 [0,0][0,0]

I am unable to resolve this error. At one time in the many rebuilds Use-CacheCluster did come back but the cache host was not built. I checked two other farms (dev and prod) and both have the same DC behavior. They do not have the timer job issues since I rolled back the patch on Dev and will not patch Prod until I resolve my Test farm (clean vanilla farm). These were clean and no issues with the December 2022 or any other updates back 1 year. The issue with the CacheHost came about after the last month's updates, though I don't think that caused it. All 3 farms have the same cache host issue, but the DC/AppFabric service is operational and no obvious issues or errors in the dev/prod farms.

SharePoint Server
SharePoint Server
A family of Microsoft on-premises document management and storage systems.
2,369 questions
SharePoint
SharePoint
A group of Microsoft Products and technologies used for sharing and managing content, knowledge, and applications.
10,808 questions
SharePoint Server Management
SharePoint Server Management
SharePoint Server: A family of Microsoft on-premises document management and storage systems.Management: The act or process of organizing, handling, directing or controlling something.
2,959 questions
{count} votes

Accepted answer
  1. Wendy Li_MSFT 1,711 Reputation points Microsoft Vendor
    2023-01-19T06:30:21.2533333+00:00

    @Debra Dorn

    Glad to know your issue is resolved and thanks for your sharing here.

    By the way, since the Microsoft Q&A community has a policy that "You can only accept answers from other users, i.e., you cannot accept your own answer". and according to the scenario introduced here: Answering your own questions on Microsoft Q&A, I would make a brief summary of this thread:

    Issue Symptom:

    After patching Test farm (SP2016 On premise, custom server role/single server farm) with the January 2023 SharePoint 2016 update and ran psconfig. Psconfig failed on first go with distributed cache and a “path” issue. Two timer jobs consistently started failing every 5 minutes since the update/psconfig.

    Current status:

    From Debra's reply, the following fixed this issue.

    "Only on the DC did I make this change. It was the HKEYLocalMachine/SYSTEM>CurrentControlSet>Control>SecurePipeServers>Winreg. It already had Local Administrators and the service account was there, but it was blocked. I added the WSS_WPG group and gave it read permissions. This is the Remote Registry key. In my environment it's possible a security change or policy change was pushed as it wasn't this way in December. All works now. Use-CacheCluster is clean and get-cachehost comes back with correct information."

    You could click the "Accept Answer" button for this summary, this can make it easier for other community members to see the useful information when reading this thread. Thanks for your understanding!

    1 person found this answer helpful.
    0 comments No comments

4 additional answers

Sort by: Most helpful
  1. Aneel v 250 Reputation points
    2023-01-13T05:35:12.9466667+00:00

    It seems like there is an issue with your SharePoint Distributed Cache after patching and running psconfig. It's possible that the Distributed Cache service is not properly running or configured on the server.

    You can try the following steps to troubleshoot the issue:

    1. Verify that the Distributed Cache service is running on the server. You can check this by going to Services on the server and look for the AppFabric Caching Service.
    2. Check that the Distributed Cache service is properly configured. You can do this by running the following command in PowerShell: Use-CacheCluster
    3. If the service is not running, try starting it manually and then run the Use-CacheCluster command again.
    4. If the service is running and configured correctly, but you are still getting an error, try recreating the Distributed Cache service by following these steps:

    • Stop the Distributed Cache service

    • Delete the Distributed Cache service

    • Run the following command to recreate the service: Add-SPDistributedCacheServiceInstance

    • Start the Distributed Cache service

    1. Check the ULS logs for more detailed information on the specific error.
    2. Try checking the permission on the cache host and the registry key.
    3. Check the HOSTS file entry
    4. Check the correlation ID in the ULS logs, it may give you more information on the error.

    It's also a good idea to check the documentation, troubleshooting guides and KB article of Microsoft to see if there are any known issues with the January 2023 SharePoint 2016 update and the Distributed Cache service.


  2. Debra Dorn 21 Reputation points
    2023-01-17T18:22:50.6666667+00:00

    I was able to resolve this by digging down into every log. DC uses remote registry to connect and the WINREG key did not have proper permissions even though the Local Administrator's group was there. I added the WSS_WPG group with READ access and that has resolved this issue.


  3. Miguel Godinho 0 Reputation points Microsoft Employee
    2023-02-16T15:57:32.9533333+00:00

    there is an update for this issue, where it was requested a fix when you installed the January 2023 CU:
    Trending Issue: Distributed Cache problems after applying January 2023 CU – Stefan Goßner:

    https://blog.stefan-gossner.com/2023/02/16/trending-issue-distributed-cache-problems-after-applying-january-2023-cu/

    0 comments No comments

  4. Stefan Goßner 656 Reputation points Microsoft Employee
    2023-02-16T20:35:30.7433333+00:00

    @Debra Dorn @Aneel v We have identified the root cause for the issue and published the following blog post which includes step to analyze and resolve the issue:

    https://blog.stefan-gossner.com/2023/02/16/trending-issue-distributed-cache-problems-after-applying-january-2023-cu/


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.