Deploy a Valkey cluster on Azure Kubernetes Service (AKS)

In this article, we review the challenges of properly using Azure availability zones when running a Valkey cluster on AKS, share some scaling considerations, and propose a solution.

Important

Open-source software is mentioned throughout AKS documentation and samples. Software that you deploy is excluded from AKS service-level agreements, limited warranty, and Azure support. As you use open-source technology alongside AKS, consult the support options available from the respective communities and project maintainers to develop a plan.

For example, the Ray GitHub repository describes several platforms that vary in response time, purpose, and support level.

Microsoft takes responsibility for building the open-source packages that we deploy on AKS. That responsibility includes having complete ownership of the build, scan, sign, validate, and hotfix process, along with control over the binaries in container images. For more information, see Vulnerability management for AKS and AKS support coverage.

What is Valkey?

Valkey is a fork of the Redis project that preserves its original open-source license. Valkey is a high performance database that supports a key-value datastore, and you can use it for caching, session storage, message queues, and more. A Valkey cluster has multiple nodes that are responsible for hosting your Valkey data stores. Valkey shards data into smaller portions and disperses it among the nodes. In a simplified Valkey cluster consisting of three primary nodes, a single replica node supports each node to enable basic failover capabilities. The data is distributed across the nodes, enabling the cluster to continue functioning even if one of the nodes fails.

Screenshot of a Valkey cluster on AKS.

For more information, see the Valkey documentation.

Valkey solution overview

The goal of this solution is to deploy Valkey on AKS with the same level of service as the Azure Cache for Redis Premium tier.

The following table lists key features of the Azure Cache for Redis Premium tier and the proposed Valkey solution:

Azure Cache for Redis Premium tier Valkey solution
Memory up to 1.2 TB Using three Valkey primaries running on the Standard_E64_v5 SKU.
Replication Adding at least one replica pod per primary pod.
Zone redundancy Placing primary and replica pods in different availability zones.

We create two distinct StatefulSet resources: one for the Valkey primaries and one for the replicas. The spec.affinity of the StatefulSet API places the primary pods in two different availability zones and the replica pods in another third availability zone. This approach ensures that a single zone failure doesn't impact the availability for any Valkey shard.

Note

Note that the solution suggested in this article differs from the Valkey documentation, where cluster Pods belong to a single StatefulSet, and the spec.affinity only ensures that the Pods are placed on different nodes. The automatic Valkey cluster initialization presented in the Valkey documentation doesn't ensure that the primary and replica Pods for the same shard are placed in different availability zones.

Next steps

Contributors

Microsoft maintains this article. The following contributors originally wrote it:

  • Nelly Kiboi | Service Engineer
  • Saverio Proto | Principal Customer Experience Engineer
  • Don High | Principal Customer Engineer
  • LaBrina Loving | Principal Service Engineer
  • Ken Kilty | Principal TPM
  • Russell de Pina | Principal TPM
  • Colin Mixon | Product Manager
  • Ketan Chawda | Senior Customer Engineer
  • Naveed Kharadi | Customer Experience Engineer
  • Erin Schaffer | Content Developer 2