Validating Content on Early Releases

As an early adopter of Microsoft products for use on www.microsoft.com Web sites, the Microsoft.com Engineering Operations (MSCOM Ops) Debug team uses multiple tools, such as TinyGet, the Web Capacity Analysis Tool (WCAT), and Log Parser during the investigation phase. We also use manual verification to perform other specific tasks. We use these tools to compare results from the pre-release version of the product against those from the existing product that is currently deployed in production, and to validate specific scenarios that are shared goals set by the Product and Debug teams.

During the investigation phase of the early adoption process, we take a single server out of production, and then install the product build on that server. After we install the new build, we run verification tools against this single computer. We then return the server to production to gather data from the real production load.

As part of our ongoing efforts to use and improve early versions of Microsoft software, it is important that we validate as much of the content as possible before placing a server back into production. A given server may receive up to 60,000 unique URLs in a one-hour period. Therefore, it is unfeasible to assume that manually verifying each URL is the correct first step to determine whether something is significantly wrong with the setup. Instead, we rely on a set of simple tools to help us ensure that the server is ready for further validation.

This article provides two examples that illustrate this process:

·         Using TinyGet with Log Parser to perform simple content verification

·         Using WCAT with Log Parser to simulate production load

Log Parser

The key tool that we use in this effort is Log Parser. Although Log Parser has been discussed in many other articles and documents, this article emphasizes the features that we use at www.microsoft.com related to content validation. The primary Log Parser feature that we use in this case is its ability to handle inputs from keys sources such as the IIS Log and the Windows Event Log.

The second major feature that we use is less well-known Log Parser template output format (-o:tpl). This feature allows you to easily define a text based output that conforms to the general form of header, body, and footer. The sample that is installed with Log Parser demonstrates how to use this feature to create HTML pages. The Microsoft.com Debug team uses this feature to create the inputs for the tools we use for content validation.

TinyGet

TinyGet.exe is a command-line based HTTP client available in the Internet Information Services (IIS) 6.0 Resource Kit. While its feature set is broad, the key feature we leverage is its ability to take a text file as a script input. Although TinyGet is excellent for quickly validating content, its ability to act as a stress client is limited due to the implementation of its script feature and the multi-connection feature. For example, TinyGet executes each request rather than spreading the requests across its threads of execution.

For more information about how to download the IIS 6.0 Resource Kit, see "Internet Information Services (IIS) 6.0 Resource Kit Tools".

WCAT

For applying load, we depend on WCAT 6.3 as our key stress tool. Note that as a general stress client, WCAT may not fulfill everyone’s needs. Many people find that the load test tool (available in the Microsoft® Visual Studio® 2005 Team System) is superior to WCAT, especially when support is needed for complex HTTP transactions. However for simulating load of GET requests (instead of the more dynamic POST requests), WCAT may fulfill your needs.

For more information about the Visual Studio 2005 Team System load test tool, see "Report Visual Studio Team System Load Test Results Via A Configurable Web Site".

One of the major advantages that WCAT has in our environment is the xcopy deployment nature of the tool, which allows us to avoid running the full product installation. Xcopy deployment is defined as using the XCopy command line tool or another utility that has similar file copying capabilities. This means that the dependencies we have on validating a new server installation come down to copying out a handful of executable files, Log Parser queries and templates, and a batch file or two.

For more information about how to download WCAT 6.3, see "WCat 6.3 (x86)" and "WCat 6.3 (x64)".

Using TinyGet with Log Parser to Perform Simple Content Verification

As mentioned earlier, the first step in dogfood validation is to use TinyGet to validate the content. The format of the script that we pass to TinyGet uses the same parameters that you would use at a command prompt. Fortunately, you can mix command-line parameters and script-based parameters. The command-line parameters become the default values that you can overwrite in your script. Because the basic approach is to use Log Parser to parse the logs and print the output through a template, all that remains is to implement the query and the template.

This example only focuses on pages that return HTTP 200,206, or 304 status codes. It also only simulates HTTP GET and HEAD requests because we do not have the data to post, and we do not want to impact production databases by accidently inserting data. Based on these requirements, we use the following Log Parser query that produces just the URL (both cs-uri-stem and cs-uri-query), verb, and status code:

SELECT

       cs-method AS verb,

        REPLACE_CHR(case realQueryString

                WHEN NULL THEN URLESCAPE(cs-uri-stem)

                WHEN '' THEN URLESCAPE(cs-uri-stem)

                ELSE STRCAT(

                        STRCAT(

                                URLESCAPE(cs-uri-stem),

                                '?'

                        ),

                        realQueryString

                      )

        END,'\\','\\\\') AS URI,

        count(*) AS WEIGHT,

        case sc-status when 304 then 200 when 206 then 200 else sc-status end as STATUSCODE

using

        extract_token(cs-uri-query,0,'|') as realQueryString  //Note that ASP Classic will add values to the querystring.

INTO %outfile%

FROM %logfile%

WHERE

        cs-method IN ('GET';'HEAD')

        and STATUSCODE in (200)

GROUP BY verb, URI, STATUSCODE

Depending on your application implementation, you may need to add output for s-port, cs-host, or other related routing fields. We use the following simple matching template to produce the TinyGet.exe script:

<LPBODY>-uri "%URI%" -status:%STATUSCODE% -verb:%verb%

At execution time, Log Parser prints the content between <LPBODY> and </LPBODY> once for every row of output. It replaces the %URI%, %STATUSCODE%, and %VERB% placeholders with the fields from the output. Note that you can also set environment variables and insert them the same way. You may also notice that the template does not appear to use the WEIGHT field. We added this field to ensure that if we decided to only produce a given number of requests, we would get the ones that were the most requested by adding the TOP modifier to the Log Parser SELECT statement.

Now that we have a query and a template, all we need to do is execute Log Parser, and then run TinyGet.exe by using the output. Assuming that you save the query as TinyGet.sql and the template as TinyGet.tpl, the command you execute is:

Logparser.exe file:TinyGet.sql?logfile=u_ex*.log+outfile=TinyGet.txt

This command creates a file that we can then pass to TinyGet.exe by executing the following command:

Tinyget.exe -server:localhost -z:TinyGet.txt

To get a more realistic client experience, we usually add the correct host name and a valid user agent to the command line by using the –rh (request header) parameter as demonstrated in the following command:

Tinyget.exe -server:localhost -z:TinyGet.txt -rh:”Host: www.microsoft.com\r\nUser-Agent: Mozilla/4.0 (compatible; MSE 6.0; Windows NT 5.1; SV1)\r\n”

As previously discussed, because we are adding the host name and user agent at the command line, it becomes the default setting for all of the requests made by TinyGet.exe. By using the -status parameter in the script, the only requests that will print anything are those that actually fail because they do not have the matching HTTP status code.

Using WCAT with Log Parser to Simulate Production Load

After you have verified that your content works for a single request, you can start stressing the content. One of the main reasons we use WCAT is because it uses text files for configuration. Therefore, all we need to do to simulate load is create another Log Parser template and query.

We can use the sample configuration file (Home.ubr) that ships with WCAT as the basis for our template:

scenario

{

    name    = "IIS Home Page";

 

    warmup      = 30;

    duration    = 120;

    cooldown    = 10;

 

    /////////////////////////////////////////////////////////////////

    //

    // All requests inherit the settings from the default request.

    // Defaults are overridden if specified in the request itself.

    //

    /////////////////////////////////////////////////////////////////

    default

    {

        // Send the keep-alive header.

        setheader

        {

            name    = "Connection";

            value   = "keep-alive";

        }

 

        // set the host header

        setheader

        {

            name    = "Host";

            value   = server();

        }

 

        // HTTP1.1 request

        version     = HTTP11;

 

        // Keep the connection alive after the request.

        close       = ka;

    }

 

    //

    // This script is made for IIS7.0.

    //

    transaction

    {

        id = "Default Web Site home page";

        weight = 1;

 

        request

        {

            url         = "/";

            statuscode  = 200;

        }

 

        request

        {

            url         = "/welcome.png";

            statuscode  = 200;

        }

 

        //

        // Specifically close the connection after both files are requested.

        //

        close

        {

            method      = reset;

        }

    }

}

To create the template, all that is required is to replace the parts in the .ubr file with the field variables, and add the tags for the Log Parser header, Log Parser body, and Log Parser tail:

<LPHEADER>

scenario

{

    name    = "Generated Using Log Parser";

 

    warmup      = 30;

    duration    = 120;

    cooldown    = 10;

    default

    {

        setheader

        {

            name    = "Host";

            value   = “www.microsoft.com”;

        }

        setheader

        {

            name    = "Connection";

            value   = "keep-alive";

        }

 

        setheader

        {

            name    = "User-Agent";

            value   = " Mozilla/4.0 (compatible; MSE 6.0; Windows NT 5.1; SV1)";

        }

        version     = HTTP11;

        close       = ka;

    }

</LPHEADER>

 <LPBody>

    transaction

    {

        id = "URL %ID%";

        weight = %WEIGHT%;

 

        request

        {

            url         = "%URI%";

            statuscode  = %STATUSCODE%;

        }

    }

</LPBODY>

<LPTAIL>

}

</LPTAIL>

To keep things simple, we use a single request for each transaction. You can also link requests together by using the cs(referer) field from the IIS log. However, this would require that you use Log Parser programmatically instead of using this simple use of the SELECT statement and Template Output Format.

The final step is to create a matching Log Parser query that generates the necessary fields. There are only four fields that we are focused on in this step:

·         ID

·         URI

·         WEIGHT

·         STATUSCODE

We use the Log Parser OUT_ROW_NUMBER() function to create ID because it is nothing more than a unique ID,. WCAT summarizes the weights of all the transactions, and then distributes requests based on the proportion of the transactions weight to the sum of the weights. Therefore, the WEIGHT field is nothing more than the number of times this URL was requested. This means we can simply use Log Parser’s COUNT(*) aggregate function. The URI field also seems simple enough, so the only trick is ensuring that we add the query string as appropriate. And assuming the same restrictions apply as from before, we could actually just put 200 in instead of the %STATUSCODE% template variable. However, we use the variable in case we wish to expand to non-200 cases in the future.

The following example illustrates how the Log Parser query appears, based on this logic:

SELECT

        OUT_ROW_NUMBER() as ID,

        REPLACE_CHR(case realQueryString

                WHEN NULL THEN URLESCAPE(cs-uri-stem)

                WHEN '' THEN URLESCAPE(cs-uri-stem)

                ELSE STRCAT(

                        STRCAT(

                                URLESCAPE(cs-uri-stem),

                                '?'

                        ),

                        realQueryString

                      )

        END,'\\','\\\\') AS URI,

        count(*) AS WEIGHT,

        case sc-status when 304 then 200 when 206 then 200 else sc-status end as STATUSCODE

using

        extract_token(cs-uri-query,0,'|') as realQueryString  //Note that ASP Classic adds values to the querystring.

INTO %outfile%

FROM %logfile%

WHERE

        cs-method = 'GET'

        and STATUSCODE = 200

GROUP BY verb, URI, STATUSCODE

ORDER BY weight DESC

Note that this query is very similar to the one that we used to produce the TinyGet script. However, we do not use the verb parameter in this query because, according to the product documentation, WCAT does not currently support the HEAD verb.

After we have verified that our query and our template match (saved as WCat.sql and WCat.tpl respectively), we run Log Parser to produce the CurrentLog.ubr file, and then run WCAT to start applying stress:

Logparser file:WCat.sql?logfile=u_ex*.log+outfile=CurrentLog.ubr -o:tpl -tpl:WCat.tpl

 

start wcctl.exe -t currentlog.ubr -s localhost -c 1 -v 100

start wcclient.exe localhost

After you run these commands, you should see stress being applied to your server. Of course, this is when the challenging part of the process begins:

·         Determining what, if anything, is wrong

·         Determining if the performance of the system will meet your needs

Summary

In this article, we described how the Microsoft.com Engineering Operations Debug team uses Log Parser in conjunction with TinyGet and WCAT to perform simple content validation, and to simulate production load. Hopefully you have been able to see a new way to use some tools that have actually been around for a long time. Based on the examples in this article, you should also be able to apply the technique of using Log Parser to turn logs into validation tool inputs using templates to other tools that use text based configuration files (for example, .csv and .xml).

Additional References

For more information about the Microsoft.com Engineering Operations team, see "Introducing the Microsoft.com Engineering Operations Team".

For more information about how the Microsoft.com Engineering Operations team adopts pre-release versions of products, see "About Early Technology Adoption (Dogfooding)" and "Migrating a Large, High-Volume Web Site to Internet Information Services 7.0".