Resiliency Patterns and Guidance

Resiliency

Resiliency is the ability of a system to gracefully handle and recover from failures. The nature of cloud hosting, where applications are often multi-tenant, use shared platform services, compete for resources and bandwidth, communicate over the Internet, and run on commodity hardware means there is an increased likelihood that both transient and more permanent faults will arise. Detecting failures, and recovering quickly and efficiently, is necessary to maintain resiliency.

The following patterns are related to maximizing resiliency in cloud-hosted applications.

Circuit Breaker Pattern

ResiliencyDesign PatternsShow All

Handle faults that may take a variable amount of time to rectify when connecting to a remote service or resource. This pattern can improve the stability and resiliency of an application.

Circuit Breaker Pattern

For more info, see the Circuit Breaker.

Compensating Transaction Pattern

ResiliencyDesign PatternsShow All

Undo the work performed by a series of steps, which together define an eventually consistent operation, if one or more of the operations fails. Operations that follow the eventual consistency model are commonly found in cloud-hosted applications that implement complex business processes and workflows.

Compensating Transaction Pattern

For more info, see the Compensating Transaction Pattern.

Leader Election Pattern

Design and ImplementationResiliencyDesign PatternsDownload code sampleShow All

Coordinate the actions performed by a collection of collaborating task instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the other instances. This pattern can help to ensure that tasks do not conflict with each other, cause contention for shared resources, or inadvertently interfere with the work that other task instances are performing.

Leader Election Pattern

For more info, see the Leader Election Pattern.

Retry Pattern

ResiliencyDesign PatternsShow All

Enable an application to handle temporary failures when connecting to a service or network resource by transparently retrying the operation in the expectation that the failure is transient. This pattern can improve the stability of the application.

Retry Pattern

For more info, see the Retry Pattern.

Scheduler Agent Supervisor Pattern

MessagingResiliencyDesign PatternsShow All

Coordinate a set of actions across a distributed set of services and other remote resources, attempt to transparently handle faults if any of these actions fail, or undo the effects of the work performed if the system cannot recover from a fault. This pattern can add resiliency to a distributed system by enabling it to recover and retry actions that fail due to transient exceptions, long-lasting faults, and process failures.

Scheduler Agent Supervisor Pattern

For more info, see the Scheduler Agent Supervisor Pattern.

Next Topic | Previous Topic | Home | Community

patterns & practices Developer Center