Editja

Architecture strategies for performance testing

Applies to this Azure Well-Architected Framework Performance Efficiency checklist recommendation:

PE:06 Optimize your workload's performance by regularly testing in a production-like environment to ensure your workload reaches the desired performance targets and achieves your business objectives.

Performance testing is a nonfunctional testing practice used to evaluate how a workload behaves under various conditions. It helps you identify performance degradation early, address problems proactively, and ensure continued alignment with service-level agreements.

When you measure response times, throughput, resource usage, and stability, you gather evidence that your workload consistently meets defined targets and delivers the level of performance your business requires.

The key strategies in this article build on the foundational testing practices described in OE:09 Architecture strategies for testing. We recommend reviewing that article first. The recommendations in this guide are scoped to performance and focus on achieving performance targets so your workloads remain aligned with evolving business objectives.

The following table defines key performance terms used throughout this article.

Term Definition
Performance targets The specific performance values a workload must meet, such as response time, throughput, or number of concurrent users.
Performance thresholds The boundaries that separate acceptable performance from unacceptable performance for a given metric.
Performance budget The portion of an overall performance target allocated to each layer or component of a workload.
Error budget The allowed level of errors or failures, derived from SLOs.
Acceptance criteria The conditions a test result must satisfy for the workload to pass its performance requirements.
Hypothesis-driven experimentation A testing method where you state a prediction about performance, test it against a baseline, and validate it with measured results.
Performance baseline A set of metrics that represent the behavior of a workload under normal conditions as validated by testing.
Synthetic transactions Scripted requests that simulate real user interactions to measure system performance under controlled conditions.
Performance regression A decline in performance compared to an established baseline, introduced by a change in code, configuration, or infrastructure.
Performance drift A gradual decline in performance over time that goes unnoticed without regular testing against established baselines.

Set measurable goals for your performance tests

Measurable performance goals turn subjective expectations into objective criteria that you can test and validate.

Define your performance targets and assign budgets. Define and document specific performance targets, such as how many concurrent users you need to support. Make sure these targets align with your service-level objectives (SLOs), and translate them into measurable test objectives.

Assign performance and error budgets across different layers of your workload. When performance tests fail, your budgets help you identify which layer is responsible and where to focus optimization efforts. Without budgets, failing tests only tell you that performance targets aren't being met, not where the problem lies.

For example, you might set budgets of 400 ms for API response time, 150 ms for database queries, and a 1% cap on failed requests. When a test fails, you can check each layer's results against its budget to determine whether the problem is slow API responses, slow database queries, or a spike in errors.

Note

Avoid defining SLOs before understanding your user flows and performance requirements. SLOs should be based on real user needs and business goals, not arbitrary targets.

Define acceptance criteria with clear pass and fail thresholds. Base your acceptance criteria on performance metrics such as latency, response times, throughput, resource utilization, error rates, and any other performance indicators that align with your performance targets.

Define thresholds for each metric so your tests produce clear pass or fail results. If your SLO requires 95% of requests to complete within 200 ms, set the API response time threshold to 200 ms at the 95th percentile. Any test run where the 95th percentile exceeds 200 ms is a fail.

Start early and test continuously

Early performance analysis catches architectural bottlenecks before they become costly to remedy.

Start performance testing as early as possible in the software development lifecycle of your workload. You don't need a complete application to begin. Developers can profile code locally, measure response times, and identify resource-intensive operations. Early testing informs design decisions, validates architectural choices against performance goals, and identifies optimization opportunities.

Continuously test your workload as it evolves to meet new requirements. Each code change might introduce performance regressions. Run tests regularly to catch these changes early. Incorporate performance tests in deployment pipelines and run periodic automated tests to detect performance drift before it reaches production.

Tradeoff. Early performance testing requires dedicated infrastructure and specialized expertise, which increases operational costs. Balance this investment against the cost of performance problems discovered late and production incidents.

Test under real-world conditions

Performance tests should match real-world conditions so your results are meaningful.

Mirror your production environment

Your test environment should mirror production as closely as practical. Tailor your approach for the environment based on your workload's risk profile.

For mission-critical workloads, match production exactly across:

  • Compute SKUs and configurations
  • Autoscaling settings
  • Caching configurations
  • Network conditions (latency, bandwidth)
  • External dependencies

For noncritical workloads, testing in a scaled-down environment that mimics production can provide useful insights at lower cost.

Prevent configuration drift. Configuration drift can lead to misleading test results. Implement automated checks to verify your test environment matches production. Ensure correct versions are deployed before running tests.

Tradeoff. Full production replication for performance testing significantly increases infrastructure costs. Evaluate whether the risk of performance problems in production justifies the cost of dedicated performance testing infrastructure for your workload.

Validate performance in production

Test environments can't fully replicate real-world conditions that affect performance. Production tests expose problems that only surface under actual usage and provide accurate baselines for future optimization. Some performance requirements can only be validated where real users, data, and infrastructure intersect.

Run controlled production testing. Schedule tests during off-peak hours to understand how your workload behaves under resource exhaustion and recovers from failures.

Production testing reveals performance characteristics under actual conditions, including:

  • Realistic user behavior patterns and data volumes
  • True network latency and bandwidth variations
  • Geographic distribution effects
  • Third-party API performance and dependencies
  • Actual caching behavior and infrastructure characteristics

Use progressive testing techniques. Start with small percentages of traffic and gradually increase. Monitor response times, throughput, error rates, and resource utilization at each step. This gradual approach limits risk while identifying the breaking point, revealing bottlenecks, and providing an accurate view of system behavior under increasing demand.

Monitor production tests continuously to find problems early. Implement automated safeguards that halt tests if they negatively impact users, such as automated rollback mechanisms and real-time alerting. These techniques ensure quick response and minimize disruptions.

Note

When you run controlled performance tests in production, you need to allocate extra capacity to handle the additional load generated by the tests.

Risk: Production testing directly affects real customers because it can create extra load and disrupt traffic. Always implement safeguards, limit exposure, and have rollback plans ready to minimize potential business impact. Balance the benefits of realistic testing against the potential business impact of disrupting live users.

Validate changes with hypothesis-driven experiments

Use hypothesis-driven experimentation to guide your performance testing. Design individual performance experiments so they produce meaningful results.

Start with a focused hypothesis about your workload's performance and define measurable success criteria that lead to actionable decisions. For example, your hypothesis might be: "Adding an index to the orders table reduces query time by 70% under peak load." Your baseline is the current schema, and the variant is the schema with the new index. Run the same load test against both versions. Capture query latency, database CPU usage, and throughput, then compare results to determine whether the hypothesis holds true.

Tradeoff. Hypothesis-driven experiments require running the same tests against both baseline and variant configurations, which increases infrastructure costs and test execution time. Focus experiments on high-impact changes where the potential performance gain justifies the additional testing effort.

Apply multiple performance test types

Performance testing covers a range of tests that assess speed, stability, and scalability under various conditions. Each test type targets distinct performance aspects of your workload. It uncovers unique insights and enables a full evaluation that goes beyond functional testing.

Use multiple test types to validate your workload from different angles. For example, stress testing finds the breaking point under peak load, but only endurance testing reveals memory leaks that surface over hours or days.

Choose test types based on what you need to validate.

The following table shows when to use each test type and what it reveals about your workload. While this table isn't an exhaustive list, it serves as an illustrative example.

Testing type Primary purpose When to apply What it reveals Environment
Load testing Verify system handles expected user volumes under normal and peak usage Start early, run frequently Baseline performance, capacity limits, scaling effectiveness Staging or production-like environment
Stress testing Understand system limits and breaking points Before the system is production-ready Maximum capacity, failure modes, recovery behavior Dedicated performance testing environment
Spike testing Ensure system handles sudden traffic spikes Start early, especially for public-facing apps Autoscaling responsiveness, queue handling, graceful degradation Staging or production environment
Endurance/soak testing Detect problems that only appear over extended periods After initial load tests pass Memory leaks, resource exhaustion, connection pool problems Production-like environment with full resource allocation

Don't try to implement all test types immediately. Begin with basic load testing to understand your baseline performance. As you identify risks and gain experience, expand to stress testing, spike testing, and eventually endurance testing.

Tradeoff. Performance testing across all test types requires significant time and infrastructure investment. Match your testing investment to your business risk.

Use real-world usage patterns and data characteristics

Testing with realistic data provides accurate insights into resource consumption, system behavior, and hidden performance problems.

Create diverse test data sets that represent various scenarios, user profiles, and data volumes. Use input variations and randomization to mimic real user diversity. Include edge cases that might cause performance problems, such as large payloads, complex queries, or high concurrency.

Your test data should look like real production data. Use synthetic data that has production data characteristics. Reserve production datasets (properly anonymized) for certain scenarios such as to highlight data management behaviors like transaction consistency, latency, and volume handling.

Simulate synthetic transactions that mimic real user workflows. Script these transactions and run them repeatedly to generate load that reflects how your workload is actually used.

Your test scenarios should reflect actual usage patterns such as concurrent user access, peak load periods, and specific transaction sequences. Make sure scenarios align with business goals so performance outcomes reflect true user value.

When testing under load, include actual third-party API calls. Mocking external dependencies makes tests run faster and more predictably, but it hides real-world performance problems. If your app depends on a payment processor API, test with real calls to understand end-to-end latency.

Use test results to guide design decisions

Your test results drive design decisions by establishing reliable baselines and guiding optimization efforts.

Establish your baseline measurements. Baselines help you identify trends and anomalies and whether optimization changes deliver improvements. You need reliable baselines to track performance trends over time.

Record performance metrics during initial tests. This recording is your baseline, a snapshot of "normal" performance. In subsequent runs, compare new results against this baseline to detect performance changes. Consider user impact, frequency, cost of fix, and risk of change criteria when you examine the data to understand system behavior under various conditions. Look for patterns that show where performance degrades. Use this insight to prioritize optimization efforts.

Optimization is an iterative process and should be driven by data. Set aside dedicated time in your development cycle for performance optimization. Use your baselines to measure the impact of changes and ensure they deliver the expected improvements without introducing regressions.

Correlate performance with business metrics. Connect performance improvements to business outcomes such as revenue, user engagement, customer satisfaction, and conversion rate to justify continued investment in performance optimization.

Note

Regularly review and update your baselines after significant changes to your workload, such as architectural changes, new features, or scaling adjustments. By doing this action, you make sure that your performance targets remain relevant.

Keep test assets aligned with current usage patterns

Your performance test assets contain critical knowledge about your workload's expected behavior, acceptable performance thresholds, and realistic traffic patterns.

Organize test suites by type. Keep load tests, stress tests, and endurance tests in separate suites. Don't mix them. Each type has different setup requirements, run durations, and success criteria. Organized suites make it easier to run targeted tests, compare results across runs, and maintain each suite independently.

Refresh test data regularly. Stale test data leads to unrealistic results. Regenerate test data to reflect current production data characteristics whenever your data model changes, data volumes grow, or user demographics shift.

Review test scenarios as your workload evolves. Schedule regular reviews to ensure your scenarios still reflect actual usage. Scenarios become outdated as:

  • User behavior shifts over time
  • Traffic patterns change as your user base grows
  • New features introduce different usage patterns
  • Infrastructure scales to meet new capacity requirements

Azure facilitation

Azure Pipelines enables you to integrate performance testing into your CI/CD pipeline. You can add load testing as a step in your pipeline to validate the performance and scalability of your applications.

Azure Chaos Studio helps you inject real-world faults into your application so that you can run controlled fault injection experiments. The experiments help you measure, understand, and improve your cloud application and service resilience.

Azure Load Testing is a load testing service that generates high-scale load on any application. Load Testing provides capabilities for automating load tests and integrating them into your continuous integration and continuous delivery (CI/CD) workflow. You can define test criteria, such as average response time or error thresholds, and automatically stop load tests based on specific error conditions. Load Testing offers a dashboard that provides live updates and detailed resource metrics of Azure application components during a load test. You can analyze the test results, identify performance bottlenecks, and compare multiple test runs to understand performance regressions over time.

Azure Monitor is a comprehensive monitoring solution for collecting, analyzing, and responding to telemetry from your cloud and on-premises environments. Application Insights is an extension of Monitor that provides APM features. You can use Application Insights to monitor applications during development and testing and also in production.

Performance Efficiency checklist

Refer to the complete set of recommendations.