Analyzing HTTP Traffic On Your IIS 7.0 Cluster
If you have ever run an enterprise web site you have probably received a panicked call from a customer or from your Tier 1 monitoring team that your site was responding slowly or throwing 500 errors and that you needed to resolve this issue ASAP. Maybe you are rolling out a new application or software across your site and want to perform side by side analysis of servers to ensure their health.
Over the past year we have rolled out multiple pre-release versions of Windows Server 2008 / IIS 7.0 onto the www.microsoft.com site. During this process we have had to monitor the overall health of the site. Which included being able to quickly identify content/application that were not behaving well on a server, cluster of servers or across the entire site. When running an enterprise of IIS servers it is invaluable to understand the traffic patterns of your site and to have the tools to help identify problem areas. There are many tools in the wild to monitor the health of your enterprise, but I will show you how to use Logparser to harvest your IIS logs and pinpoint a site problem. The goal of this blog is to help you identify issues or anomalous traffic patterns on your IIS web site.
For the purposes of this blog I will use an eight server cluster for our analysis. Sometimes tracking down issues on your site can be like trying to find a needle in a haystack. I like to start by looking at the overall health of the site or for the purposes of my examples, a cluster of servers. Next, I like to drill down into more detail on the server that looks unhealthy. In my examples I will drill down by URI and followed next by the associated window of time that the server/URI was unhealthy.
Here is where the fun begins! The first step is to perform a query by HTTP status codes for the cluster. Using Logparser I created a file called statuscodes_by_servers_in_cluster.sql and fed in the input IIS log files from all the IIS servers in the cluster which are called u_ex07111409.log. This result in an easy to read table showing you the summarized results by HTTP status code for each of the servers in your cluster.
Summarized HTTP status codes by server for a complete a complete cluster
logparser -rtp:-1 file:statuscodes_by_servers_in_cluster.sql?logfilename=u_ex07111409.log
sc-status Srv1 Srv2 Srv3 Srv4 Srv5 Srv6 Srv7 Srv8
--------- ------ ------ ------ ------ ------ ------ ------ ------
200 576313 585078 573956 574218 566388 599807 572982 589815
206 242 238 247 262 236 216 192 246
301 6148 6388 6378 6154 6140 6600 6293 6398
302 166817 168700 166471 165909 163386 172658 166764 170363
304 125493 126640 125908 126204 124139 130669 125855 129026
400 4 3 3 2 2 7 7 3
403 130 106 146 170 148 154 148 95
404 9023 9149 9835 8833 8811 9489 9250 9369
405 0 1 0 0 0 0 0 0
406 2 10 5 11 4 7 8 4
500 1361 342 363 337 332 323 340 351
501 18 22 21 25 31 27 34 30
I created the following Logparser file (statuscodes_by_servers_in_cluster.sql) to create the site by side server table of HTTP status code values.
statuscodes_by_servers_in_cluster.sql
SELECT
sc-status,
SUM(_Srv1) AS Srv1,
SUM(_Srv2) AS Srv2,
SUM(_Srv3) AS Srv3,
SUM(_Srv4) AS Srv4,
SUM(_Srv5) AS Srv5,
SUM(_Srv6) AS Srv6,
SUM(_Srv7) AS Srv7,
SUM(_Srv8) AS Srv8
USING
CASE s-computername WHEN 'ServerName1' THEN 1 ELSE 0 END AS _Srv1,
CASE s-computername WHEN 'ServerName2' THEN 1 ELSE 0 END AS _Srv2,
CASE s-computername WHEN 'ServerName3' THEN 1 ELSE 0 END AS _Srv3,
CASE s-computername WHEN 'ServerName4' THEN 1 ELSE 0 END AS _Srv4,
CASE s-computername WHEN 'ServerName5' THEN 1 ELSE 0 END AS _Srv5,
CASE s-computername WHEN 'ServerName6' THEN 1 ELSE 0 END AS _Srv6,
CASE s-computername WHEN 'ServerName7' THEN 1 ELSE 0 END AS _Srv7,
CASE s-computername WHEN 'ServerName8' THEN 1 ELSE 0 END AS _Srv8
FROM
\\ServerName1\logshare$\%logfilename%,
\\ServerName2\logshare$\%logfilename%,
\\ServerName3\logshare$\%logfilename%,
\\ServerName4\logshare$\%logfilename%,
\\ServerName5\logshare$\%logfilename%,
\\ServerName6\logshare$\%logfilename%,
\\ServerName7\logshare$\%logfilename%,
\\ServerName8\logshare$\%logfilename%
GROUP BY
sc-status
ORDER BY
sc-status
You will notice that the HTTP 500 status codes are rather high on one of our servers (SRV1). Now that we have identified which server looks unhealthy we can drill down and identify the problem application or URI which is causing the high level of 500’s server errors.
The next Logparser query I like to use is a spill by URI for the specific HTTP 500 status code on the cluster. Using Logparser I created a file called cs-uri-stem_by_servers_in_cluster.sql and fed in the same input IIS log files from all the IIS servers in the cluster called u_ex07111409.log.
URI spill for ‘500’ HTTP status code for a complete cluster:
logparser -rtp:-1 file:cs-uri-stem_by_servers_in_cluster.sql?logfilename=u_ex07111409.log
sc-status Srv1 Srv2 Srv3 Srv4 Srv5 Srv6 Srv7 Srv8 cs-uri-stem
--------- ---- ---- ---- ---- ---- ---- ---- ---- ----------------------------------------------
500 0 0 0 0 0 0 0 1 /brasil/pr/2002/ms_edu_minas.stm
500 0 0 0 1 0 0 0 0 /brasil/pr/2002/ms_games_br.stm
500 0 0 0 0 0 1 0 0 /brasil/technet/topicos/codered.stm
500 1 0 0 0 0 0 0 0 /brasil/windows2000/requisitos.stm
500 0 0 0 0 2 0 0 0 /egypt/
500 0 0 0 0 1 0 0 1 /esp
500 0 0 0 0 0 0 0 35 /fwquery/
500 28 9 10 35 21 20 23 31 /gbadapp/errorpages/error.aspx
500 267 160 184 260 270 270 274 275 /ibadapp/errorpages/error.aspx
500 1 0 0 0 0 0 0 0 /middleeast/press/2001/may/gold_cert.stm
500 0 0 0 1 0 0 0 0 /middleeast/press/2001/may/officexpa.stm
500 1060 7 17 38 35 32 38 0 /obadapp/errorpages/error.aspx
500 0 0 0 0 0 0 1 0 /projectserver
500 4 2 1 2 1 0 4 5 /shared/1/navigation.asmx/DisplayDlNavHtml
500 0 0 0 0 2 0 0 1 /shared/2/navigation.asmx/DisplayDlNavHtml
500 0 0 0 0 0 0 0 2 /xml/overview.asp
I created the following Logparser file (cs-uri-stem_by_servers_in_cluster.sql) to create the URI spill by server for the HTTP 500 status code.
cs-uri-stem_by_servers_in_cluster.sql
SELECT
sc-status,
SUM(_Srv1) AS Srv1,
SUM(_Srv2) AS Srv2,
SUM(_Srv3) AS Srv3,
SUM(_Srv4) AS Srv4,
SUM(_Srv5) AS Srv5,
SUM(_Srv6) AS Srv6,
SUM(_Srv7) AS Srv7,
SUM(_Srv8) AS Srv8,
cs-uri-stem
USING
CASE s-computername WHEN 'ServerName1' THEN 1 ELSE 0 END AS _Srv1,
CASE s-computername WHEN 'ServerName2' THEN 1 ELSE 0 END AS _Srv2,
CASE s-computername WHEN 'ServerName3' THEN 1 ELSE 0 END AS _Srv3,
CASE s-computername WHEN 'ServerName4' THEN 1 ELSE 0 END AS _Srv4,
CASE s-computername WHEN 'ServerName5' THEN 1 ELSE 0 END AS _Srv5,
CASE s-computername WHEN 'ServerName6' THEN 1 ELSE 0 END AS _Srv6,
CASE s-computername WHEN 'ServerName7' THEN 1 ELSE 0 END AS _Srv7,
CASE s-computername WHEN 'ServerName8' THEN 1 ELSE 0 END AS _Srv8
FROM
\\ServerName1\logshare$\%logfilename%,
\\ServerName2\logshare$\%logfilename%,
\\ServerName3\logshare$\%logfilename%,
\\ServerName4\logshare$\%logfilename%,
\\ServerName5\logshare$\%logfilename%,
\\ServerName6\logshare$\%logfilename%,
\\ServerName7\logshare$\%logfilename%,
\\ServerName8\logshare$\%logfilename%
WHERE sc-status = 500
GROUP BY
cs-uri-stem,
sc-status
ORDER BY
cs-uri-stem
There are a couple URIs that are throwing the 500 errors on SRV1. The majority of the errors coming from the URI “/obadapp/errorpages/error.aspx”. The next step is to determine if there was a window where this URI/application was performing poorly.
The next Logparser query I like to use is to spill by HTTP status code and by time (in minutes) for the problem server in question. This will determine if the URI / application on ServerName1 (SRV1) was throwing the 500 errors consistently throughout the hour or during a specific window of time. Using Logparser I created a file called httpstatuspivot.sql and fed in the same input IIS log file from ServerName1 called u_ex07111409.log.
Spill of HTTP status codes by time
logparser -rtp:-1 file:httpstatuspivot.sql?logfilename=\\ServerName1\logshare$\u_ex07111409.log
Minute 200s 206s 301s 302s 304s 403s 404s 405s 406s 500s 501s
------------------- ----- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
2007-11-14 09:00:00 11143 3 128 3302 2477 6 158 6 0 3 0
2007-11-14 09:01:00 10659 4 103 3158 2399 2 142 2 0 3 0
2007-11-14 09:02:00 11139 6 129 3215 2410 1 154 1 0 4 0
2007-11-14 09:03:00 10749 1 116 3141 2192 3 161 3 0 4 0
2007-11-14 09:04:00 10814 2 132 3130 2179 2 180 2 0 3 0
2007-11-14 09:05:00 11139 3 134 3240 2527 5 172 5 0 6 0
2007-11-14 09:06:00 11005 8 136 3285 2384 2 189 2 0 8 0
2007-11-14 09:07:00 10913 6 116 3125 2409 6 165 6 0 7 0
2007-11-14 09:08:00 11291 4 136 3267 2315 2 204 2 1 5 0
2007-11-14 09:09:00 10854 5 140 3239 2358 2 162 2 0 5 1
2007-11-14 09:10:00 10219 4 107 2975 2215 1 166 1 1 8 0
2007-11-14 09:11:00 10006 6 124 2854 2050 1 180 1 0 6 0
2007-11-14 09:12:00 10555 5 112 2895 2163 1 160 1 0 7 0
2007-11-14 09:13:00 10404 2 90 3007 2228 0 190 0 0 5 0
2007-11-14 09:14:00 9246 9 103 2839 1852 0 143 0 0 3 0
2007-11-14 09:15:00 9751 4 114 2964 2180 3 135 3 0 4 0
2007-11-14 09:16:00 11884 4 137 3411 2645 2 169 2 0 7 0
2007-11-14 09:17:00 10314 5 94 2838 2203 2 146 2 0 3 0
2007-11-14 09:18:00 9844 2 93 2911 2123 3 132 3 0 6 2
2007-11-14 09:19:00 10080 2 115 2883 2158 1 132 1 0 6 0
2007-11-14 09:20:00 10410 3 114 2993 2309 3 155 3 0 4 0
2007-11-14 09:21:00 10817 3 117 3173 2385 0 188 0 0 4 0
2007-11-14 09:22:00 10313 4 117 3016 2324 3 185 3 0 9 1
2007-11-14 09:23:00 10210 2 107 2792 2058 2 143 2 0 5 0
2007-11-14 09:24:00 10122 6 85 2785 2204 2 156 2 0 6 0
2007-11-14 09:25:00 10852 4 100 3095 2332 3 161 3 0 7 0
2007-11-14 09:26:00 9857 6 100 2911 2100 3 135 3 0 6 0
2007-11-14 09:27:00 9707 4 103 2820 2081 3 164 3 0 8 0
2007-11-14 09:28:00 9858 7 90 2775 2142 6 139 6 0 4 0
2007-11-14 09:29:00 9508 3 77 2767 2064 3 137 3 0 2 0
2007-11-14 09:30:00 8810 3 90 2593 2094 1 110 1 0 11 3
2007-11-14 09:31:00 8913 4 96 2522 1955 1 153 1 0 6 1
2007-11-14 09:32:00 9214 6 112 2789 2176 3 150 3 0 9 0
2007-11-14 09:33:00 9930 3 106 2871 2118 5 140 5 0 4 3
2007-11-14 09:34:00 9136 6 95 2548 1794 3 123 3 0 5 1
2007-11-14 09:35:00 9431 3 106 2734 2136 1 173 1 0 3 0
2007-11-14 09:36:00 9174 13 99 2667 2034 0 116 0 0 5 0
2007-11-14 09:37:00 8292 4 91 2507 1807 1 115 1 0 7 0
2007-11-14 09:38:00 9275 2 82 2496 1911 3 139 3 0 5 0
2007-11-14 09:39:00 8290 3 84 2396 1741 2 116 2 0 108 0
2007-11-14 09:40:00 8802 7 117 2556 2034 5 151 5 0 617 0
2007-11-14 09:41:00 8549 6 90 2332 2002 1 140 1 0 311 1
2007-11-14 09:42:00 8857 5 146 2521 1901 0 167 0 0 1 2
2007-11-14 09:43:00 8200 3 91 2315 1819 1 136 1 0 4 0
2007-11-14 09:44:00 8397 5 92 2388 1953 3 128 3 0 6 0
2007-11-14 09:45:00 8204 1 79 2453 1911 1 124 1 0 4 0
2007-11-14 09:46:00 8979 3 93 2657 2035 1 142 1 0 6 0
2007-11-14 09:47:00 9369 5 79 2757 2066 2 166 2 0 8 0
2007-11-14 09:48:00 8086 2 80 2375 1751 0 148 0 0 5 0
2007-11-14 09:49:00 8119 2 76 2193 1597 2 96 2 0 10 0
2007-11-14 09:50:00 8642 4 104 2494 2066 2 147 2 0 10 0
2007-11-14 09:51:00 8032 4 83 2292 1889 2 147 2 0 7 0
2007-11-14 09:52:00 8349 3 88 2585 1956 1 159 1 0 10 1
2007-11-14 09:53:00 8507 4 88 2589 1792 1 157 1 0 4 0
2007-11-14 09:54:00 9180 2 84 2538 1918 3 154 3 0 3 2
2007-11-14 09:55:00 8475 1 91 2523 1930 3 134 3 0 8 0
2007-11-14 09:56:00 8357 1 83 2428 1855 1 143 1 0 8 0
2007-11-14 09:57:00 8144 5 80 2354 1785 2 130 2 0 7 0
2007-11-14 09:58:00 8930 1 76 2567 1890 2 147 2 0 5 0
2007-11-14 09:59:00 10007 4 98 2971 2111 4 169 4 0 6 0
I created the following Logparser file (httpstatuspivot.sql) to create the spill by time (in minutes) for the server which was throwing the 500 errors. Also, an excellent TechNet article entitled “Inside Microsoft.com - Analyze Web Stats with Log Parser” also covers pivoting output of your Logparser query.
httpstatuspivot.sql
SELECT quantize(to_localtime(to_timestamp(date,time)),60) AS Minute,
SUM([_200]) AS [200s],
SUM([_206]) AS [206s],
SUM([_301]) AS [301s],
SUM([_302]) AS [302s],
SUM([_304]) AS [304s],
SUM([_403]) AS [403s],
SUM([_404]) AS [404s],
SUM([_403]) AS [405s],
SUM([_406]) AS [406s],
SUM([_500]) AS [500s],
SUM([_501]) AS [501s]
USING
CASE sc-status WHEN 200 THEN 1 ELSE 0 END AS [_200],
CASE sc-status WHEN 206 THEN 1 ELSE 0 END AS [_206],
CASE sc-status WHEN 301 THEN 1 ELSE 0 END AS [_301],
CASE sc-status WHEN 302 THEN 1 ELSE 0 END AS [_302],
CASE sc-status WHEN 304 THEN 1 ELSE 0 END AS [_304],
CASE sc-status WHEN 403 THEN 1 ELSE 0 END AS [_403],
CASE sc-status WHEN 404 THEN 1 ELSE 0 END AS [_404],
CASE sc-status WHEN 401 THEN 1 ELSE 0 END AS [_405],
CASE sc-status WHEN 406 THEN 1 ELSE 0 END AS [_406],
CASE sc-status WHEN 500 THEN 1 ELSE 0 END AS [_500],
CASE sc-status WHEN 501 THEN 1 ELSE 0 END AS [_501]
FROM %logfilename%
GROUP BY Minute
ORDER BY Minute
You will notice that during 9:39 through 9:41 this server was not performing well. From our earlier Logparser query the problem URI / application to focus on is “/obadapp/errorpages/error.aspx”.
Now that we have identified the problem URI / application you can engage the team that created the content / application and debug this issue further.
Using Logparser and the samples files above you can analyze the performance of your site as you roll out new content, applications or a new Operation System and ensure that you quickly identify issues during the rollout process.
Comments
- Anonymous
January 01, 2003
PingBack from http://aceddl.cn/x/analyzing-http-traffic-on-your-iis-70-cluster.jsp