Deploy a MongoDB cluster on Azure Kubernetes Service (AKS)

This article walks through prerequisite information for deploying a MongoDB cluster on AKS and provides an overview of the deployment strategy.

Important

Open-source software is mentioned throughout AKS documentation and samples. Software you deploy is excluded from AKS service-level agreements, limited warranty, and Azure support. As you utilize open-source technology alongside AKS, consult the support options available from the respective communities and project maintainers to develop a plan.

For example, Ray's GitHub repository describes several platforms that vary in response time, purpose, and support level.

Microsoft takes responsibility for building the open-source packages we deploy on AKS. That responsibility includes having complete ownership of the build, scan, sign, validate, and hotfix process, as well as control over the binaries in container images. For more information, see Vulnerability management for AKS and AKS support coverage.

What is MongoDB?

MongoDB is a popular NoSQL database management system designed to handle large volumes of unstructured data. Unlike traditional relational databases that use tables and rows, MongoDB uses a flexible, document-oriented approach.

Note

MongoDB Community Edition isn't open-source software and is licensed under the Server Side Public License with "source available."

MongoDB sharded cluster

A MongoDB sharded cluster is designed to handle large datasets and high throughput by distributing data across multiple servers or shards. This architecture enables horizontal scaling, which is essential for applications with growing data and performance needs.

Here’s a breakdown of its key components and how it works:

  • Shards: Shards are individual MongoDB instances that hold subsets of the data. Each shard is a replica set, or a group of MongoDB instances that replicate data among themselves, ensuring high availability and fault tolerance.
  • Config servers: Config servers store metadata and configuration settings for the sharded cluster. They keep track of the cluster’s data distribution and routing information. There are typically three config servers to provide redundancy.
  • Mongos instances: Mongos is a routing service that directs client requests to the appropriate shard. It acts as an intermediary between the client and the shards, managing query routing and aggregating results from multiple shards.
  • Shard key: When data is distributed across shards, it's based on a shard key, which is either a single indexed field or multiple fields in the documents. The shard key determines how data is partitioned among the shards. A well-chosen shard key ensures even data distribution and efficient querying.
  • Data distribution: Data is distributed across shards based on the shard key. This distribution helps balance the load and manage large datasets effectively. MongoDB uses a range-based or hash-based sharding strategy depending on the shard key.
  • High availability: Each shard is a replica set, meaning it replicates its data across multiple nodes. This setup ensures that data remains available even if one or more nodes fail.

What is the Percona Operator for MongoDB?

The Percona Operator for MongoDB is an open-source tool developed by Percona designed to automate the deployment, management, and scaling of MongoDB clusters within Kubernetes environments. It simplifies operations by handling tasks such as provisioning, scaling, backup, and recovery, all while ensuring high availability and performance of MongoDB clusters.

The operator uses Kubernetes Custom Resource Definitions (CRDs) to manage MongoDB configurations declaratively and handle failovers, monitoring, and alerts, which results in reduced administrative overhead and consistent management practices. Ideal for development, testing, and production scenarios, the Percona Operator enhances the efficiency and reliability of MongoDB deployments, particularly in cloud-native applications.

Diagram of MongoDB cluster.

MongoDB solution overview

The goal of the proposed solution is to ensure that the MongoDB cluster can effectively handle large datasets and high throughput operations and maintain high availability and fault tolerance through the use of replica sets, anti-affinity rules, and proper resource allocation.

Deployment strategy

The MongoDB deployment strategy consists of the following components:

  • A sharded cluster to enable the distribution of data across multiple shards, improving scalability and performance.
  • Configuration servers managed by a three-member replica set to ensure fault tolerance and high availability, with anti-affinity rules to distribute these servers across different failure domains.
  • Three Mongos instances distributed across availability zones and exposed internally within the cluster for load balancing and resiliency to route client requests.

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors:

  • Nelly Kiboi | Service Engineer
  • Saverio Proto | Principal Customer Experience Engineer
  • Don High | Principal Customer Engineer
  • LaBrina Loving | Principal Service Engineer
  • Ken Kilty | Principal TPM
  • Russell de Pina | Principal TPM
  • Colin Mixon | Product Manager
  • Ketan Chawda | Senior Customer Engineer
  • Naveed Kharadi | Customer Experience Engineer
  • Erin Schaffer | Content Developer 2
  • Carol Smith | Senior Content Developer

Next steps