Cloud Design Patterns

These design patterns are useful for building reliable, scalable, secure applications in the cloud.

Each pattern describes the problem that the pattern addresses, considerations for applying the pattern, and an example based on Microsoft Azure. Most patterns include code samples or snippets that show how to implement the pattern on Azure. However, most patterns are relevant to any distributed system, whether hosted on Azure or other cloud platforms.

Cloud workloads are prone to the fallacies of distributed computing. Some examples of cloud design fallacies are:

  • The network is reliable
  • Latency is zero
  • Bandwidth is infinite
  • The network is secure
  • Topology doesn't change
  • There is one administrator
  • Component versioning is simple
  • Observability implementation can be delayed

Design patterns don't eliminate notions such as these but can help bring awareness, compensations, and mitigations of them. Each cloud pattern has its own trade-offs. You need to pay attention more to why you're choosing a certain pattern than to how to implement it.

Challenges in cloud development

Data management

Data Management

Data management is the key element of cloud applications, and it influences most of the quality attributes. Data is typically hosted in different locations and across multiple servers for performance, scalability or availability. This can present various challenges. For example, data consistency must be maintained, and data will typically need to be synchronized across different locations.

Design and implementation

Design and Implementation

Good design encompasses consistency and coherence in component design and deployment, maintainability to simplify administration and development, and reusability to allow components and subsystems to be used in other applications and scenarios. Decisions made during the design and implementation phase significantly impact the quality and total cost of ownership of cloud-hosted applications and services.

Messaging icon

Messaging

The distributed nature of cloud applications requires a messaging infrastructure that connects the components and services, ideally loosely coupled to maximize scalability. Asynchronous messaging is widely used and provides many benefits, but it also brings challenges such as ordering messages, poison message management, idempotency, and more.

Catalog of patterns

Pattern Summary Category
Ambassador Create helper services that send network requests on behalf of a consumer service or application. Design and Implementation,
Operational Excellence
Anti-Corruption Layer Implement a façade or adapter layer between a modern application and a legacy system. Design and Implementation,
Operational Excellence
Asynchronous Request-Reply Decouple backend processing from a frontend host, where backend processing needs to be asynchronous, but the frontend still needs a clear response. Messaging
Backends for Frontends Create separate backend services to be consumed by specific frontend applications or interfaces. Design and Implementation
Bulkhead Isolate elements of an application into pools so that if one fails, the others will continue to function. Reliability
Cache-Aside Load data on demand into a cache from a data store Data Management,
Performance Efficiency
Choreography Let each service decide when and how a business operation is processed, instead of depending on a central orchestrator. Messaging,
Performance Efficiency
Circuit Breaker Handle faults that might take a variable amount of time to fix when connecting to a remote service or resource. Reliability
Claim Check Split a large message into a claim check and a payload to avoid overwhelming a message bus. Messaging
Compensating Transaction Undo the work performed by a series of steps, which together define an eventually consistent operation. Reliability
Competing Consumers Enable multiple concurrent consumers to process messages received on the same messaging channel. Messaging
Compute Resource Consolidation Consolidate multiple tasks or operations into a single computational unit Design and Implementation
CQRS Segregate operations that read data from operations that update data by using separate interfaces. Data Management,
Design and Implementation,
Performance Efficiency
Deployment Stamps Deploy multiple independent copies of application components, including data stores. Reliability,
Performance Efficiency
Edge Workload Configuration The great variety of systems and devices on the shop floor can make workload configuration a difficult problem. Design and Implementation
Event Sourcing Use an append-only store to record the full series of events that describe actions taken on data in a domain. Data Management,
Performance Efficiency
External Configuration Store Move configuration information out of the application deployment package to a centralized location. Design and Implementation,
Operational Excellence
Federated Identity Delegate authentication to an external identity provider. Security
Gatekeeper Protect applications and services by using a dedicated host instance that acts as a broker between clients and the application or service, validates and sanitizes requests, and passes requests and data between them. Security
Gateway Aggregation Use a gateway to aggregate multiple individual requests into a single request. Design and Implementation,
Operational Excellence
Gateway Offloading Offload shared or specialized service functionality to a gateway proxy. Design and Implementation,
Operational Excellence
Gateway Routing Route requests to multiple services using a single endpoint. Design and Implementation,
Operational Excellence
Geodes Deploy backend services into a set of geographical nodes, each of which can service any client request in any region. Reliability,
Operational Excellence
Health Endpoint Monitoring Implement functional checks in an application that external tools can access through exposed endpoints at regular intervals. Reliability,
Operational Excellence
Index Table Create indexes over the fields in data stores that are frequently referenced by queries. Data Management,
Performance Efficiency
Leader Election Coordinate the actions performed by a collection of collaborating task instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the other instances. Design and Implementation,
Reliability
Materialized View Generate prepopulated views over the data in one or more data stores when the data isn't ideally formatted for required query operations. Data Management,
Operational Excellence,
Performance Efficiency
Pipes and Filters Break down a task that performs complex processing into a series of separate elements that can be reused. Design and Implementation,
Messaging
Priority Queue Prioritize requests sent to services so that requests with a higher priority are received and processed more quickly than those with a lower priority. Messaging,
Performance Efficiency
Publisher/Subscriber Enable an application to announce events to multiple interested consumers asynchronously, without coupling the senders to the receivers. Messaging
Queue-Based Load Leveling Use a queue that acts as a buffer between a task and a service that it invokes in order to smooth intermittent heavy loads. Reliability,
Messaging,
Resiliency,
Performance Efficiency
Rate Limit Pattern Limiting pattern to help you avoid or minimize throttling errors related to these throttling limits and to help you more accurately predict throughput. Reliability
Retry Enable an application to handle anticipated, temporary failures when it tries to connect to a service or network resource by transparently retrying an operation that's previously failed. Reliability
Saga Manage data consistency across microservices in distributed transaction scenarios. A saga is a sequence of transactions that updates each service and publishes a message or event to trigger the next transaction step. Messaging
Scheduler Agent Supervisor Coordinate a set of actions across a distributed set of services and other remote resources. Messaging,
Reliability
Sequential Convoy Process a set of related messages in a defined order, without blocking processing of other groups of messages. Messaging
Sharding Divide a data store into a set of horizontal partitions or shards. Data Management,
Performance Efficiency
Sidecar Deploy components of an application into a separate process or container to provide isolation and encapsulation. Design and Implementation,
Operational Excellence
Static Content Hosting Deploy static content to a cloud-based storage service that can deliver them directly to the client. Design and Implementation,
Data Management,
Performance Efficiency
Strangler Fig Incrementally migrate a legacy system by gradually replacing specific pieces of functionality with new applications and services. Design and Implementation,
Operational Excellence
Throttling Control the consumption of resources used by an instance of an application, an individual tenant, or an entire service. Reliability,
Performance Efficiency
Valet Key Use a token or key that provides clients with restricted direct access to a specific resource or service. Data Management,
Security