Chapter 12 – Modeling Application Usage

 

patterns & practices Developer Center

Performance Testing Guidance for Web Applications

J.D. Meier, Carlos Farre, Prashant Bansode, Scott Barber, and Dennis Rea
Microsoft Corporation

September 2007

Objectives

  • Learn the difference between concurrent users and user sessions and why this is important when defining input for Web load tests.
  • Learn how to identify individual usage scenarios.
  • Learn about the metrics that will help in developing realistic workload characterizations.
  • Learn how to incorporate individual usage scenarios and their variances into user groups.
  • Learn how to identify and model special considerations when blending groups of users into single models.
  • Learn how to construct realistic workload models for Web applications based on expectations, documentation, observation, log files, and other data available prior to the release of the application to production.

Overview

The most common purpose of Web load tests is to simulate the user’s experience as realistically as possible. For performance testing to yield results that are directly applicable to understanding the performance characteristics of an application in production, the tested workloads must represent a real-world production scenario. To create a reasonably accurate representation of reality, you must understand the business context for the use of the application, expected transaction volumes in various situations, expected user path(s) by volume, and other usage factors. By focusing on groups of users and how they interact with the application, this chapter demonstrates an approach to developing workload models that approximate production usage based on various data sources.

Testing a Web site in such a way that the test can reliably predict performance is often more art than science. As critical as it is to creating load and usage models that will predict performance accurately, the data necessary to create these models is typically not directly available to the individuals who conduct the testing. When it is, it is typically not complete or comprehensive.

While it is certainly true that simulating unrealistic workload models can provide a team with valuable information when conducting performance testing, you can only make accurate predictions about performance in a production environment, or prioritize performance optimizations, when realistic workload models are simulated.

How to Use This Chapter

Use this chapter to understand how to model workload characterization, which can be used for performance testing to simulate production characteristics. To get the most from this chapter:

  • Use the “Approach for Modeling Application Usage” section to get an overview of the approach for modeling workload characterization and as a quick reference guide for you and your team.
  • Use the various activity sections to understand the details of the activities, and to find critical explanations of the concepts of user behavior involved in workload modeling.

Approach for Modeling Application Usage

The process of identifying one or more composite application usage profiles for use in performance testing is known as workload modeling. Workload modeling can be accomplished in any number of ways, but to varying degrees the following activities are conducted, either explicitly or implicitly, during virtually all performance-testing projects that are successful in predicting or estimating performance characteristics in a production environment:

  • Identify the objectives.
  • Identify key usage scenarios.
  • Determine navigation paths for key scenarios.
  • Determine individual user data and variances.
  • Determine the relative distribution of scenarios.
  • Identify target load levels.
  • Prepare to implement the model.

These activities are discussed in detail in the following sections.

Identify the Objectives

The objectives of creating a workload model typically center on ensuring the realism of a test, or on designing a test to address a specific requirement, goal, or performance-testing objective. (For more information, see Chapter 9 – Determine Performance Testing Objectives and Chapter 10 – Quantify End-User Response Time Goals.) When identifying the objectives, work with targets that will satisfy the stated business requirements. Consider the following key questions when formulating your objectives:

  • What is the current or predicted business volume over time? For example, how many orders are typically placed in a given time period, and what other activities — number of searches, browsing, logging, and so on — support order placement?
  • How is the business volume expected to grow over time? Your projection should take into account future needs such as business growth, possible mergers, introduction of new products, and so on.
  • What is the current or predicted peak load level? This projection should reflect activities that support sales and other critical business processes, such as marketing campaigns, newly shipped products, time-sensitive activities such as stock exchange transactions dependent on external markets, and so on.
  • How quickly do you expect peak load levels to be reached? Your prediction should take into consideration unusual surges in business activity — how fast can the organization adjust to the increased demand when such an event happens?
  • How long do the peak load levels continue? That is, how long does the new demand need to be sustained before exhaustion of a resource compromises the service level agreements (SLAs)? For example, an economic announcement may cause the currency-exchange market to experience prolonged activity for two or three days, as opposed to just a few hours.

This information can be gathered from Web server logs, marketing documentation reflecting business requirements, or stakeholders. The following are some of the objectives identified during this process:

  • Ensure that one or more models represent the peak expected load of X orders being processed per hour.
  • Ensure that one or more models represent the difference between “quarterly close-out” period usage patterns and “typical business day” usage patterns.
  • Ensure that one or more models represent business/marketing projections for up to one year into the future.

It is acceptable if these objectives only make sense in the context of the project at this point. The remaining activities will help you fill in the necessary details to achieve the objectives.

Considerations

Consider the following key points when identifying objectives:

  • Throughout the process of creating workload models, remember to share your assumptions and drafts with the team and solicit their feedback.
  • Do not get overly caught up in striving for perfection, and do not fall into the trap of oversimplification. In general, it is a good idea to start executing tests when you have a testable model and then enhance the model incrementally while collecting results.

Determine Key Usage Scenarios

To simulate every possible user task or activity in a performance test is impractical, if not a sheer impossibility. As a result, no matter what method you use to identify key scenarios, you will probably want to apply some limiting heuristic to the number of activities or key scenarios you identify for performance testing. You may find the following limiting heuristics useful:

  • Include contractually obligated usage scenario(s).
  • Include usage scenarios implied or mandated by performance testing goals and objectives.
  • Include most common usage scenario(s).
  • Include business-critical usage scenario(s).
  • Include performance-intensive usage scenario(s).
  • Include usage scenarios of technical concern.
  • Include usage scenarios of stakeholder concern.
  • Include high-visibility usage scenarios.

The following information sources are frequently useful in identifying usage scenarios that fit into the categories above:

  • Requirements and use cases
  • Contracts
  • Marketing material
  • Interviews with stakeholders
  • Information about how similar applications are used
  • Observing and asking questions of beta-testers and prototype users
  • Your own experiences with how similar applications are used

If you have access to Web server logs for a current implementation of the application ― whether it is a production implementation of a previous release, a representative prototype, or a beta release ― you can use data from those logs to validate and/or enhance the data collected using the resources above.

After you have collected a list of what you believe are the key usage scenarios, solicit commentary from the team members. Ask what they think is missing, what they think can be de-prioritized, and, most importantly, why. What does not seem to matter to one person may still be critical to include in the performance test. This is due to potential side effects that activity may have on the system as a whole, and the fact that the individual who suggests that the activity is unimportant may be unaware of the consequences.

Considerations

Consider the following key points when determining key usage scenarios:

  • Whenever you test a Web site with a significant amount of new features/functionality, use interviews. By interviewing the individuals responsible for selling/marketing the new features, you will find out what features/functions will be expected and therefore most likely to be used. By interviewing existing users, you can determine which of the new features/functions they believe they are most likely to use.
  • When testing a pre-production Web site, the best option is to roll out a (stable) beta version to a group of representative users (roughly 10-20 percent the size of the expected user base) and analyze the log files from their usage of the site.
  • Run simple in-house experiments using employees, customers, clients, friends, or family members to determine, for example, natural user paths and the page-viewing time differences between new and returning users. This method is a highly effective method of data collection for Web sites that have never been live, as well as a validation of data collected by using other methods.
  • Remember to ask about usage by various user types, roles, or personas. It is frequently the case that team members will not remember to tell you about the less common user types or roles if you do not explicitly ask.
  • Think about system users and batch processes as well as human end users. For example, there might be a batch process that runs to update the status of orders while users are performing activities in the site. Be sure to account for those processes because they might be consuming resources.
  • For the most part, Web servers are very good at serving text and graphics. Static pages with average-size graphics are probably less critical than dynamic pages, forms, and multimedia pages.
  • Think about nonhuman system users and batch processes as well as end users. For example, there might be a batch process that runs to update the status of orders while users are performing activities on the site. In this situation, you would need to account for those processes because they might be consuming resources.
  • For the most part, Web servers are very effective at serving text and graphics. Static pages with average-size graphics are probably less critical than dynamic pages, forms, and multimedia pages.

Determine Navigation Paths for Key Scenarios

Now that you have a list of key scenarios, the next activity is to determine how individual users actually accomplish the tasks or activities related to those scenarios.

Human beings are unpredictable, and Web sites commonly offer redundant functionality. Even with a relatively small number of users, it is almost certain that real users will not only use every path you think they will to complete a task, but they also will inevitably invent some that you had not planned. Each path a user takes to complete an activity will put a different load on the system. That difference may be trivial, or it may be enormous ― there is no way to be certain until you test it. There are many methods to determine navigation paths, including:

  • Identifying the user paths within your Web application that are expected to have significant performance impact and that accomplish one or more of the identified key scenarios
  • Reading design and/or usage manuals
  • Trying to accomplish the activities yourself
  • Observing others trying to accomplish the activity without instruction

After the application is released for unscripted user acceptance testing, beta testing, or production, you will be able to determine how the majority of users accomplish activities on the system under test by evaluating Web server logs. It is always a good idea to compare your models against reality and make an informed decision about whether to do additional testing based on the similarities and differences found.

Apply the same limiting heuristics to navigation paths as you did when determining which paths you wanted to include in your performance simulation, and share your findings with the team. Ask what they think is missing, what they think can be de-prioritized, and why.

Considerations

Consider the following key points when determining navigation paths for key scenarios:

  • Some users will complete more than one activity during a visit to your site.
  • Some users will complete the same activity more than once per visit.
  • Some users may not actually complete any activities during a visit to your site.
  • Navigation paths are often easiest to capture by using page titles.
  • If page titles do not work or are not intuitive for your application, the navigation path may be easily defined by steps the user takes to complete the activity.
  • First-time users frequently follow a different path to accomplish a task than users experienced with the application. Consider this difference and what percentage of new versus return user navigation paths you should represent in your model.
  • Different users will spend different amounts of time on the site. Some will log out, some will close their browser, and others will leave their session to time out. Take these factors into account when determining or estimating session durations.
  • When discussing navigation paths with your team or others, it is frequently valuable to use visual representations.

Example Visual Representation

Bb924367.image001(en-us,PandP.10).gif

Figure 12.1* *Workload for Key Scenarios

Determine Individual User Data and Variances

No matter how accurate the model representing navigation paths and usage scenarios is, it is not complete without accounting for the data used by and the variances associated with individual users. While thinking of users as interchangeable entities leads to tests being simpler to design and analyze, and even makes some classes of performance issues easier to detect, it masks much of the real-world complexity that your Web site is likely to encounter in production. Accounting for and simulating this complexity is crucial to finding the performance issues most likely to be encountered by real users, as well as being an essential element to making any predictions or estimations about performance characteristics in production. 

The sections that follow detail some of the sources of information from which to model individual user data and variances, and some of the data and variances that are important to consider when creating your model and designing your tests.

Web Site Metrics in Web Logs

For the purposes of this chapter, Web site metrics are the variables that help you understand a site’s traffic and load patterns from the server’s perspective. Web site metrics are generally averages that may vary with the flow of users accessing the site, but they generally provide a high-level view of the site’s usage that is helpful in creating models for performance testing. These metrics ultimately reside in the Web server logs. (There are many software applications that parse these logs to present these metrics graphically or otherwise, but these are outside of the scope of this chapter.) Some of the more useful metrics that can be read or interpreted from Web server logs (assuming that the Web server is configured to keep logs) include:

  • **Page views per period. ** A page view is a page request that includes all dependent file requests (.jpg files, CSS files, etc). Page views can be tracked over hourly, daily, or weekly time periods to account for cyclical patterns or bursts of peak user activity on the Web site.
  • **User sessions per period. ** A user session is the sequence of related requests originating from a user visit to the Web site, as explained previously. As with page views, user sessions can span hourly, daily, and weekly time periods.
  • **Session duration. ** This metric represents the amount of time a user session lasts, measured from the first page request until the last page request is completed. Session duration takes into account the amount of time the user pauses when navigating from page to page.
  • **Page request distribution. ** This metric represents the distribution, in percentages, of page hits according to functional types (Home, login, Pay, etc.). The distribution percentages will establish a weighting ratio of page hits based on the actual user utilization of the Web site.
  • Interaction speed.  This metric represents the time users take to transition between pages when navigating the Web site, constituting the think time behavior. It is important to remember that every user will interact with the Web site at a different rate.
  • User abandonment.  This metric represents the length of time that users will wait for a page to load before growing dissatisfied and exiting the site. Sessions that are abandoned are quite normal on the Internet and consequently will have an impact on the load test results.

Determine the Relative Distribution of Scenarios

Having determined which scenarios to simulate and what the steps and associated data are for those scenarios, and having consolidated those scenarios into one or more workload models, you now need to determine how often users perform each activity represented in the model relative to the other activities needed to complete the workload model.

Sometimes one workload distribution is not enough. Research and experience have shown that user activities often vary greatly over time. To ensure test validity, you must validate that activities are evaluated according to time of day, day of week, day of month, and time of year. As an example, consider an online bill-payment site. If all bills go out on the 20th of the month, the activity on the site immediately before the 20th will be focused on updating accounts, importing billing information, and so on by system administrators, while immediately after the 20th, customers will be viewing and paying their bills until the payment due date of the 5th of the next month. The most common methods for determining the relative distribution of activities include:

  • Extract the actual usage, load values, common and uncommon usage scenarios (user paths), user delay time between clicks or pages, and input data variance (to name a few) directly from log files.
  • Interview the individuals responsible for selling/marketing new features to find out what features/functions are expected and therefore most likely to be used. By interviewing existing users, you may also determine which of the new features/functions they believe they are most likely to use.
  • Deploy a beta release to a group of representative users (roughly 10-20 percent the size of the expected user base) and analyze the log files from their usage of the site.
  • Run simple in-house experiments using employees, customers, clients, friends, or family members to determine, for example, natural user paths and the page-viewing time differences between new and returning users.
  • As a last resort, you can use your intuition, or best guess, to make estimations based on your own familiarity with the site.

Teams and individuals use a wide variety of methods to consolidate individual usage patterns into one or more collective models. Some of those include spreadsheets, pivot tables, narrative text, Unified Modeling Language (UML) collaboration diagrams, Markov Chain diagrams, and flow charts. In each case the intent is to make the model as a whole easy to understand, maintain, and communicate across the entire team.

One highly effective method is to create visual models of navigation paths and the percentage of users you anticipate will perform each activity that are intuitive to the entire team, including end users, developers, testers, analysts, and executive stakeholders. The key is to use language and visual representations that make sense to your team without extensive training. In fact, visual models are best when they convey their intended meaning without the need for any training at all. After you create such a model, it is valuable to circulate that model to both users and stakeholders for review/comment. Following the steps taken to collect key usage scenarios, ask the team members what they think is missing, what they think can be de-prioritized, and why. Often, team members will simply write new percentages on the visual model, making it very easy for everyone to see which activities have achieved a consensus, and which have not.

Once you are confident that the model is appropriate for performance testing, supplement that model with the individual usage data collected for each navigation path during the “Determine Individual User Data and Variances” activity, in such a way that the model contains all the data you need to create the actual test.

Bb924367.image002(en-us,PandP.10).gif 

Figure 12.2* *Visual Model of Navigation Paths

Considerations

Consider the following key points when determining the relative distribution of scenarios:

  • Create visual models and circulate them to users and stakeholders for review/comment.
  • Ensure that the model is intuitive to non-technical users, technical designers, and everyone in between.
  • Because performance tests frequently consume large amounts of test data, ensure that you include enough in your data files.
  • Ensure that the model contains all of the supplementary data necessary to create the actual test.

Identify Target Load Levels

A customer visit to a Web site comprises a series of related requests known as a user session. Users with different behaviors who navigate the same Web site are unlikely to cause overlapping requests to the Web server during their sessions. Therefore, instead of modeling the user experience on the basis of concurrent users, it is more useful to base your model on user sessions. User sessions can be defined as a sequence of actions in a navigational page flow, undertaken by a customer visiting a Web site.

Quantifying the Volume of Application Usage: Theory

It is frequently difficult to determine and express an application’s usage volume because Web-based multi-user applications communicate via stateless protocols. Although terms such as “concurrent users” and “simultaneous users” are frequently used, they can be misleading when applied to modeling user visits to a Web site. In Figures 12.3 and 12.4 below, each line segment represents a user activity, and different activities are represented by different colors. The solid black line segment represents the activity “load the Home page.” User sessions are represented horizontally across the graph. In this hypothetical representation, the same activity takes the same amount of time for each user. The time elapsed between the Start of Model and End of Model lines is one hour.

Bb924367.image003(en-us,PandP.10).jpg

Figure 12.3* *Server Perspective of User Activities

Figure 12.3 above represents usage volume from the perspective of the server (in this case, a Web server). Reading the graph from top to bottom and from left to right, you can see that user 1 navigates first to page “solid black” and then to pages “white,” “polka dot,” “solid black,” “white,” and “polka dot.” User 2 also starts with page “solid black,” but then goes to pages “zebra stripe,” “grey,” etc. You will also notice that virtually any vertical slice of the graph between the start and end times will reveal 10 users accessing the system, showing that this distribution is representative of 10 concurrent, or simultaneous, users. What should be clear is that the server knows that 10 activities are occurring at any moment in time, but not how many actual users are interacting with the system to generate those 10 activities.

Figure 12.4 below depicts another distribution of activities by individual users that would generate the server perspective graph above.

Bb924367.image004(en-us,PandP.10).jpg

Figure 12.4* *Actual Distribution of User Activities Over Time

In this graph, the activities of 23 individual users have been captured. Each of these users conducted some activity during the time span being modeled, and their respective activities can be thought of as 23 user sessions. Each of the 23 users began interacting with the site at a different time. There is no particular pattern to the order of activities, with the exception of all users who started with the “solid black” activity. These 23 users actually represent the exact same activities in the same sequence shown in Figure 12.3. However, as depicted in Figure 12.4, at any given time there are 9 to 10 concurrent users. The modeling of usage for the above case in terms of volume can be thought of in terms of total hourly users, or user sessions counted between “Start of Model” and “End of Model.”

Without some degree of empirical data (for example, Web server logs from a previous release of the application), target load levels are exactly that — targets. These targets are most frequently set by the business, based on its goals related to the application and whether those goals are market penetration, revenue generation, or something else. These represent the numbers you want to work with at the outset.

Quantifying the Volume of Application Usage

If you have access to Web server logs for a current implementation of the application — whether it is a production implementation of a previous release, a representative prototype, or a beta release — you can use data from these logs to validate and/or enhance the data collected by using the resources above. By performing a quantitative analysis on Web server logs, you can determine:

  • The total number of visits to the site over a period of time (month/week/day).
  • The volume of usage, in terms of total averages and peak loads, on an hourly basis.
  • The duration of sessions for total averages and peak loads on an hourly basis.
  • The total hourly averages and peak loads translated into overlapping user sessions to simulate real scalability volume for the load test.
  • The business cycles or special events that result in significant changes in usage.

The following are the inputs and outputs used for determining target load levels.

Inputs

  • Usage data extracted from Web server logs
  • Business volume (both current and projected) mapping to objectives
  • Key scenarios
  • Distribution of work
  • Session characteristics (navigational path, duration, percentage of new users)

Output

By combining the volume information with objectives, key scenarios, user delays, navigation paths, and scenario distributions from the previous steps, you can determine the remaining details necessary to implement the workload model under a particular target load.

Integrating Model Variance

Because the usage models are “best guesses” until production data becomes available, it is a good idea to create no fewer than three usage models for each target load. This has the effect of adding a rough confidence interval to the performance measurements. Stakeholders can focus on the results from one test based on many fallible assumptions, as well as on how many inaccuracies in those assumptions are likely to impact the performance characteristics of the application.

The three usage models that teams generally find most valuable are:

  • Anticipated Usage (the model or models you created in the “Determine Individual User Data and Variance” activity)
  • Best Case Usage, in terms of performance (that is, weighted heavily in favor of low-performance cost activities)
  • Worst Case Usage, in terms of performance (that is, weighted heavily in favor of high-performance cost activities)

The following chart is an example of the information that testing for all three of these models can provide. As you can see, in this particular case the Anticipated Usage and Best Case Usage resulted in similar performance characteristics. However, the Worst Case Usage showed that there is nearly a 50-percent drop-off in the total load that can be supported between it and the Anticipated Usage. Such information could lead to a reevaluation of the usage model, or possibly to a decision to test with the Worst Case Usage model moving forward as a kind of safety factor until empirical data becomes available.

Bb924367.image005(en-us,PandP.10).gif

Figure 12.5* *Usage Models

Considerations

Consider the following key points when identifying target load levels:

  • Although the volumes resulting from the activities above may or may not end up correlating to the loads the application will actually encounter, the business will want to know if and how well the application as developed or deployed will support its target loads.
  • Because the workload models you have constructed represent the frequency of each activity as a percentage of the total load, you should not need to update your models after determining target load levels.
  • Although it frequently is the case that each workload model will be executed at a variety of load levels and that the load level is very easy to change at run time using most load-generation tools, it is still important to identify the expected and peak target load levels for each workload model for the purpose of predicting or comparing with production conditions. Changing load levels even slightly can sometimes change results dramatically.

Prepare to Implement the Model

Implementation of the workload model as an executable test is tightly tied to the implementation method — typically, creating scripts in a load-generation tool. For more information about implementing and validating a test, see Chapter 14 – Test Execution.

Considerations

Consider the following key points when preparing to implement the model:

  • Do not change your model without serious consideration simply because the model is difficult to implement in your tool.
  • If you cannot implement your model as designed, ensure that you record the details about the model you do implement.
  • Implementing the model frequently includes identifying metrics to be collected and determining how to collect those metrics.

Summary

When conducting performance testing with the intent of understanding, predicting, or tuning production performance, it is crucial that test conditions be similar or at least close to production usage or projected future business volume.

For accurate, predictive test results, user behavior must involve modeling the customer sessions based on page flow, frequency of hits, the length of time that users stop between pages, and any other factor specific to how users interact with your Web site.

patterns & practices Developer Center