Training
Module
Microsoft Azure Well-Architected Framework - Performance efficiency - Training
Learn how to optimize your workload to meet performance requirements while balancing costs, reliability, and security.
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
In this series, we walk through several cloud application scenarios, showing how a development team used load tests and metrics to diagnose performance issues. These articles are based on actual load testing that we performed when developing example applications. The code for each scenario is available on GitHub.
Scenarios:
Performance is frequently measured in terms of throughput, response time, and availability. Performance targets should be based on business operations. Customer-facing tasks may have more stringent requirements than operational tasks such as generating reports.
Define a service level objective (SLO) that defines performance targets for each workload. You typically achieve this objective by breaking a performance target into a set of Key Performance Indicators (KPIs), such as:
Performance targets should explicitly include a target load. Also, not all users receive exactly the same level of performance, even when accessing the system simultaneously and performing the same work. So an SLO should be framed in terms of percentiles.
An example SLO for might be: "Client requests have a response within 500 ms @ P90, at loads up to 25 K requests/second."
It can be especially challenging to diagnose performance issues in a distributed application. Some of the challenges are:
A single business transaction or operation typically involves multiple components of the system. It can be hard to get a holistic end-to-end view of a single operation.
Resource consumption is distributed across multiple nodes. To get a consistent view, you need to aggregate logs and metrics in one place.
The cloud offers elastic scale. Autoscaling is an important technique for handling spikes in load, but it can also mask underlying issues. Also, it can be hard to know which components need to scale and when.
Workloads often don't scale across cores or threads. It's important to understand the requirements of your workloads and look into better optimized sizes. Some sizes offer constrained cores and disabled hyperthreading to improve single core oriented and per core licensed workloads.
Cascading failures can cause failures upstream of the root problem. As a result, the first signal of the problem may appear in a different component than the root cause.
Performance tuning is both an art and a science, but it can be made closer to science by taking a systematic approach. Here are some best practices:
Enable telemetry to collect metrics. Instrument your code. Follow best practices for monitoring. Use correlated tracing so that you can view all the steps in a transaction.
Monitor the 90/95/99 percentiles, not just average. The average can mask outliers. The sampling rate for metrics also matters. If the sampling rate is too low, it can hide spikes or outliers that might indicate problems.
Attack one bottleneck at a time. Form a hypothesis and test it by changing one variable at a time. Removing one bottleneck will often uncover another bottleneck further upstream or downstream.
Errors and retries can have a large impact on performance. If you see that backend services are throttling your system, scale out or try to optimize usage (for example by tuning database queries).
Look for common performance anti-patterns.
Look for opportunities to parallelize. Two common sources of bottlenecks are message queues and databases. In both cases, sharding can help. For more information, see Horizontal, vertical, and functional data partitioning. Look for hot partitions that might indicate imbalanced read or write loads.
Read the performance tuning scenarios
Training
Module
Microsoft Azure Well-Architected Framework - Performance efficiency - Training
Learn how to optimize your workload to meet performance requirements while balancing costs, reliability, and security.