Performance tuning a distributed application

In this series, we walk through several cloud application scenarios, showing how a development team used load tests and metrics to diagnose performance issues. These articles are based on actual load testing that we performed when developing example applications. The code for each scenario is available on GitHub.

Scenarios:

What is performance?

Performance is frequently measured in terms of throughput, response time, and availability. Performance targets should be based on business operations. Customer-facing tasks may have more stringent requirements than operational tasks such as generating reports.

Define a service level objective (SLO) that defines performance targets for each workload. You typically achieve this objective by breaking a performance target into a set of Key Performance Indicators (KPIs), such as:

  • Latency or response time of specific requests
  • The number of requests performed per second
  • The rate at which the system generates exceptions.

Performance targets should explicitly include a target load. Also, not all users receive exactly the same level of performance, even when accessing the system simultaneously and performing the same work. So an SLO should be framed in terms of percentiles.

An example SLO for might be: "Client requests have a response within 500 ms @ P90, at loads up to 25 K requests/second."

Challenges of performance tuning a distributed system

It can be especially challenging to diagnose performance issues in a distributed application. Some of the challenges are:

  • A single business transaction or operation typically involves multiple components of the system. It can be hard to get a holistic end-to-end view of a single operation.

  • Resource consumption is distributed across multiple nodes. To get a consistent view, you need to aggregate logs and metrics in one place.

  • The cloud offers elastic scale. Autoscaling is an important technique for handling spikes in load, but it can also mask underlying issues. Also, it can be hard to know which components need to scale and when.

  • Workloads often don't scale across cores or threads. It's important to understand the requirements of your workloads and look into better optimized sizes. Some sizes offer constrained cores and disabled hyperthreading to improve single core oriented and per core licensed workloads.

  • Cascading failures can cause failures upstream of the root problem. As a result, the first signal of the problem may appear in a different component than the root cause.

General best practices

Performance tuning is both an art and a science, but it can be made closer to science by taking a systematic approach. Here are some best practices:

  • Enable telemetry to collect metrics. Instrument your code. Follow best practices for monitoring. Use correlated tracing so that you can view all the steps in a transaction.

  • Monitor the 90/95/99 percentiles, not just average. The average can mask outliers. The sampling rate for metrics also matters. If the sampling rate is too low, it can hide spikes or outliers that might indicate problems.

  • Attack one bottleneck at a time. Form a hypothesis and test it by changing one variable at a time. Removing one bottleneck will often uncover another bottleneck further upstream or downstream.

  • Errors and retries can have a large impact on performance. If you see that backend services are throttling your system, scale out or try to optimize usage (for example by tuning database queries).

  • Look for common performance anti-patterns.

  • Look for opportunities to parallelize. Two common sources of bottlenecks are message queues and databases. In both cases, sharding can help. For more information, see Horizontal, vertical, and functional data partitioning. Look for hot partitions that might indicate imbalanced read or write loads.

Next steps

Read the performance tuning scenarios