Cache Server Maintenance Considerations (Windows Server AppFabric Caching)

Windows Server AppFabric caching features rely on physical servers to support the cache cluster. All servers require maintenance at some point, and many times that maintenance requires a server reboot. This topic describes important considerations for minimizing or avoiding cache cluster downtime when server reboots are required for maintenance.

Rebooting Servers

When deployed on a single server, the cluster configuration storage location is a single point of failure for the cache cluster. If lead hosts perform the cluster management role, rebooting too many of the wrong cache servers could also cause the cache cluster to shut down.

Cluster Configuration Storage Location

The cluster configuration storage location can be a SQL Server database or a shared network folder. Without access to it, the cache cluster cannot run for more than a few minutes. Before rebooting the server hosting the SQL Server or file server, shut down the cache cluster with the Stop-CacheCluster command. This command stops the cache host Windows services on all cache servers in the proper order. For more information about the cluster configuration storage location, see Cluster Configuration Storage Options (Windows Server AppFabric Caching).

Cache Server

As a general rule, we recommend rebooting only one cache server at a time. No special procedures are required when you shut down a server for a reboot. If you want to stop only the cache host Windows service, use the Stop-CacheHost command. Stopping the service with the Windows Services console is not supported. After a reboot, use the Start-CacheHost command to allow the cache host Windows servers to rejoin the cluster. For more information, see Using Windows PowerShell to Manage Windows Server AppFabric Caching Features.

When lead hosts are performing the cluster management role, for the cache cluster to remain available, a majority of lead hosts must remain available. If this is the case in your cluster, reboot only a small minority of lead hosts at any one time to avoid the cache cluster shutting down. Non-lead hosts can be rebooted at any time without affecting the running state of the cache cluster. For more information about lead hosts, see Lead Hosts and Cluster Management (Windows Server AppFabric Caching).

Note

The Stop-CacheHost command will not stop a cache host Windows service if it is performing the cluster management role and stopping the cache host will cause the entire cluster to shut down.

If SQL Server performs the cluster management role, you do not need to consider whether or not the cache host is a lead host. The cluster can continue to run with only one cache host.

Any time that the cache host Windows service on a cache server stops, all data in memory on that computer is flushed. To help insulate applications from this loss of data due to server reboots, enable the high availability feature on the named caches. This has some performance impact, but the additional overhead may outweigh the cost of reloading the cache.

For the high availability feature to help insulate your application from the failure (or stopping) of a cache host, at least three cache hosts must be members of the cache cluster. This is due to a strong consistency requirement stating that there must always be two copies of a cached object or region in a high availability-enabled cache. To maintain two copies of a cache or region, a high availability-enabled cache requires at least two cache hosts to function. For more information, see High Availability (Windows Server AppFabric Caching).

In addition to the minimum number of servers required to keep the cluster running, it is also important to consider the memory needs of your application. The caching needs of the application are not likely to change solely because a cache server rebooted. It is important that sufficient cache servers remain running to support the caching needs of the application.

Administration Recommendations

To simplify the rebooting sequence, we recommend using a SQL Server database to store your configuration settings and having that instance of SQL Server perform the cluster management role. This way, it does not matter which of the cache servers is rebooted during maintenance.

To optimize availability of the cache cluster, we recommend using Windows Server 2008 Failover Clustering to host a "clustered" database or folder resource for the cache cluster configuration storage location. With a clustered storage location, you can use more than one Windows Server 2008 server to host the "clustered" configuration storage location, allowing you to reboot one Failover Clustering node at a time without affecting the availability of the cache cluster configuration storage location.

When possible, oversize your cache cluster by adding more cache servers than your application needs at the current time. This will enable you to reboot a small number of cache servers without any material impact to the cache cluster performance.

Actions That Require Downtime

Several actions require downtime of the cache cluster. The common theme for all of the actions listed below is that they require changes to the cache cluster configuration.