Yes, We Recycle!...That Includes AppPools

And I am not just talking about Aluminum and Polystyrene. Application Pool recycling is one great feature of IIS6 but how do you know if it is helping you or hurting you? It’s great for availability and reliability but it can also be masking real problems with code running in your applications. If you are recycling too quickly based on memory limits you may have a memory leak that should be looked at, if you are recycling due to ping fails or other errors, you may have a more serious issue to look at.

  We monitor our IIS worker process uptime to measure general health of our applications and make capacity planning decisions (how many apps per app pool, etc). Process startup and .Net framework initialization does have a performance cost, If you are recycling too frequently, this can affect your web servers throughput. We’ve had our share of misbehaving applications that have recycled as frequently as every 5 minutes or just flat out crashed the worker process. You can set Application pools to recycle based on time, number of requests, virtual and private memory limits. Application pools may also recycle due to health reasons, Ping Fail, Fatal Communications Error, etc., or there’s always the admin recycle or the config change recycle. We have many applications (sometimes hundreds) spread out across several app pools. We use our WP uptime data to check if we have a misbehaving application out there (maybe someone published some bad code) or make decisions on how many app pools we can run, which applications to put into separate app pools and how many applications an App Pool should have.

How do you verify and monitor that is really happening?

Well, the first thing you have to do is log all recycle events before you can start collecting and analyzing the data. To enable recycle event logging, if you haven’t already:

cscript adsutil.vbs Set w3svc/AppPools/LogEventOnRecycle 255

This enables all recycle events for all AppPools to be logged in the system event log.

More information on Application Pool recycling events and logging is available in the following KB article:

332088 How to modify Application Pool Recycling events in IIS 6.0

https://support.microsoft.com/default.aspx?scid=kb;EN-US;332088

(note: there may be an error in the above article, I use event ID 1117 for private memory)

Once you have logging enabled you can start collecting this info and creating baselines for your application pools. One of our favorite tools, Log Parser, is great for collecting this data.

For more information on Log Parser:

https://www.microsoft.com/technet/scriptcenter/tools/logparser/default.mspx

You can use a simple log parser query to get the recycle events:

Logparser "Select top 100 to_string(TimeGenerated, 'MM/dd/yyyy hh:mm:ss') as dateTime from \\SERVERNAME\System where SourceName = 'W3SVC' and EventID in (1009;1010;1011;1074;1077;1078;1079;1080;1117) and Message like '%DefaultAppPool%'"

This query will give all the recycle and process failure events for the application pool named like defaultapppool. I’ve included a few other events in this query that aren’t necessarily recycle events but worker process failure events that do affect the uptime of your worker process.

Here is a list of events that I am interested in:

1009 = Worker process Terminated UnExpectedly

1010 = Worker process failed a Ping

1011 = Worker process suffered a Fatal Comummincations Error

1074 = Recycled based on the defined Time Limit

1077 = Recycled based on the Virtual Mem Limit

1078 = An ISAPI reported unhealthy

1079 = An Admin requested a recycle

1080 = A Config Change required a recycle

1117 = Recycled based on Memory limit

That’s a good start but I want to see why we recycled as well, so this query is a little better:

Select top 100 to_string(TimeGenerated, 'MM/dd/yyyy hh:mm:ss') as dateTime,

case

            EventID

            when 1009 then 'UnExpEnd'

            when 1010 then 'PingFail'

            when 1011 then 'FatalComErr'

            when 1074 then 'TimeLimit'

            when 1077 then 'VMem'

            when 1078 then 'ISAPIUnHealth'

            when 1079 then 'Admin'

            when 1080 then 'ConfigChange'

            when 1117 then 'PMem'

end as Reason

     from \\SERVERNAME\System

            where SourceName in ('W3SVC';'WAS') and

        EventID in (1117;1080;1079;1078;1077;1074;1011;1010;1009) and

            Message like '%defaultapppool%' and

            TimeGenerated > to_timestamp('01-01-2006 00:01:01','yyyy-MM-dd hh:mm:ss')

Save this as wpuptime.sql and run “logparser file:wpuptime.sql” and this will give you an ouput like:

dateTime Reason

------------------- ------------

09/23/2006 01:18:51 ConfigChange

09/23/2006 01:19:52 VMem

09/23/2006 01:20:04 ConfigChange

09/28/2006 09:28:03 Admin

10/04/2006 18:24:16 ConfigChange

10/05/2006 07:50:02 PingFail

10/05/2006 08:21:26 PingFail

That’s better, I can see event date/time and reason but I still want more info, like what is the average up time for the app pool that I am interested in? What was the shortest recycle time, what was the longest, what are the count of each events? To go further with this, I turn to vbscript and invoke the log parser COM object. To use the log parser COM object you must register the DLL on the machine where you will run the script.

Regsvr32 logparser.dll

Here’s an example script that uses the above query and calculates the info, it doesn’t have much error checking and output isn’t exactly pretty, but I did say it was an example right?

' WPUPTIME.VBS - Queries event logs for W3SVC Recycle events (last 100) and calculates up time stats between events

' Requires LogParser.dll to be registered

' 1009 = TermUnExp

' 1010 = PingFail

' 1011 = FatalComErr

' 1074 = TimeLimit

' 1077 = VMem

' 1078 = ISAPIUnhealthy

' 1079 = Admin

' 1080 = ConfigChange

' 1117 = PMem

'

Dim cConfChg, cAdmin, cISAPI, cVMem, cPMem, cTimeLimit, cFatalComErr, cPingFail, cTermUnExp

Dim myQuery

Dim myInputFormat

Dim MinDate

Dim recordSet

Dim lastD, curD, oldD, firstD

Dim uptime_avg, uptime_mins, uptime_mins_last

Dim reccount

Dim timetotal

Dim MinTime

Dim MaxTime

Dim strReason, strReasonLast

Dim strComputer

Dim strAppPool

Dim szQuery

WScript.Echo

ParseCommandLine()

MinDate = CStr("01-01-2006 00:01:01")

Set myQuery = CreateObject("MSUtil.LogQuery")

Set myInputFormat = CreateObject("MSUtil.LogQuery.EventLogInputFormat")

myInputFormat.direction = "BW"

Call QueryServer(strComputer, strAppPool)

           

CalcUpTime()

                                               

Wscript.Echo strComputer, "," , strAppPool, "Avg Uptime:", uptime_avg, "minutes"

WScript.Echo

WScript.Echo "Count of Recycle Events"

WScript.Echo "VirtualMem:" + Chr(9)+ CStr(cVMem)

WScript.Echo "PrivateMem:" + Chr(9) + CStr(cPMem)

WScript.Echo "TimeLimit:" + Chr(9) + CStr(cTimeLimit)

WScript.Echo "PingFail:" + Chr(9) + CStr(cPingFail)

WScript.Echo "Admin:" + Chr(9)+ Chr(9) + CStr(cAdmin)

WScript.Echo "FatalComErr" + Chr(9) + CStr(cFatalComErr)

WScript.Echo "TermUnExp" + Chr(9) + CStr(cTermUnExp)

WScript.Echo "ConfChg" + Chr(9)+ Chr(9) + CStr(cConfChg)

WScript.Echo "ISAPI" + Chr(9) + Chr(9) + CStr(cISAPI)

WScript.Echo

Set myInputFormat = Nothing

Set myQuery = Nothing

set Locator = Nothing

WScript.Quit(0)

Sub CalcUpTime()

WScript.Echo("RecycleTime Uptime(mins) Reason")

reccount = 0

timetotal = 0

MinTime = 1000

MaxTime = 0

cConfChg = 0

cAdmin = 0

cISAPI = 0

cVMem = 0

cPMem = 0

cTimeLimit = 0

cFatalComErr = 0

cPingFail = 0

cTermUnExp = 0

Do While recordSet.atEnd() <> True

           

            Set record = recordSet.getRecord()

            curD = CDate(record.GetValue(0))

            strReason= CStr(record.GetValue(1))

           

            If reccount <> 0 then

                        uptime_mins = DateDiff("n", curD, lastD)

                        WScript.Echo lastD, " ", uptime_mins, strReasonLast

                        timetotal = timetotal + uptime_mins

           

                        if MinTime > uptime_mins Then

                                                MinTime = uptime_mins

                        End if

                       

                        if MaxTime < uptime_mins Then

                                                MaxTime = uptime_mins

                        End if

                       

            Else

                        firstD = CurD

            End IF

            Select Case strReason

                        Case "VMem" cVMem = cVMem + 1

                        Case "PMem" cPMem = cPMem + 1

                        Case "Admin" cAdmin = cAdmin + 1

                        Case "ISAPIUnHealth" cISAPI = cISAPI + 1

                        Case "TimeLimit" cTimeLimit = cTimeLimit + 1

                        Case "PingFail" cPingFail = cPingFail + 1

                        Case "FatalComErr" cFatalComErr = cFatalComErr + 1

                        Case "UnExpEnd" cTermUnExp = cTermUnExp + 1

                        Case "ConfigChange" cConfChg = cConfChg + 1

            End Select

           

            lastD=curD

            reccount = reccount + 1

            strReasonLast = strReason

            uptime_mins_last = uptime_mins

           

            recordSet.moveNext()

            If Err.number <> 0 Then

                        WScript.Echo "Error in CalcUpTime: ", Err.Description

                        Err.Clear

                        Exit Do

            End If

           

Loop

                       

oldD = lastD

uptime_avg = Round(timetotal / (reccount-1))

WScript.Echo ""

WScript.Echo "Time Span: " + CStr(oldD) + " through " + CStr(firstD )

WScript.Echo "MinTime:", MinTime

WScript.Echo "MaxTime:", MaxTime

WScript.Echo ""

End Sub

Sub QueryServer(tmpComp, tmpAppPool)

szQuery = "Select top 100 to_string(TimeGenerated, 'MM/dd/yyyy hh:mm:ss') as dateTime, case EventID when 1077 then 'VMem' when 1117 then 'PMem' when 1079 then 'Admin' when 1078 then 'ISAPIUnHealth' when 1074 then 'TimeLimit' when 1010 then 'PingFail' when 1011 then 'FatalComErr' when 1009 then 'UnExpEnd' when 1080 then 'ConfigChange' end as Reason" + _

     " from \\" + tmpComp + "\System where SourceName in ('W3SVC';'WAS') and EventID in (1117;1080;1079;1078;1077;1074;1011;1010;1009) and Message like '%"+tmpAppPool+ _

            "%' and TimeGenerated > to_timestamp('"+MinDate+"','yyyy-MM-dd hh:mm:ss')"

WScript.Echo "Querying", tmpComp, tmpAppPool, " events"

WScript.Echo

Set recordSet = myQuery.Execute(szQuery, myInputFormat)

If Err.number <> 0 then

            WScript.Echo "Could not execute query in QueryServer: ", Err.Description

End if

End Sub

Sub ParseCommandLine()

  Dim vArgs

  set vArgs = WScript.Arguments

 

  if vArgs.Count <> 2 then

      DisplayUsage()

  Else

     strComputer = vArgs(0)

     strAppPool = vArgs(1)

  End if

End Sub

Sub DisplayUsage()

 WScript.Echo "Usage: cscript.exe " & WScript.ScriptName & " ServerName AppPoolName" & vbLF & vbLF & _

     "Example: " & vbLF & _

      WScript.ScriptName & " MyServer DefaultAppPool" & vbLF

      WScript.Echo

 WScript.Quit(0)

End Sub

The above script takes two parameters to run – the servername and the app pool name to match. The query uses a LIKE comparison, so make sure your app pool name given is unique enough to get correct data. Here is the sample outoput (abbreviated):

RecycleTime Uptime(mins) Reason

10/18/2006 1:49:40 PM 304 PMem

10/18/2006 8:45:56 AM 535 PMem

10/17/2006 11:50:04 PM 592 PMem

10/17/2006 1:58:28 PM 842 PMem

10/16/2006 11:56:41 PM 14973 Admin

… (cut for brevity)

9/27/2006 9:06:14 AM 14 VMem

9/27/2006 8:52:02 AM 96 VMem

Time Span: 9/27/2006 7:16:08 AM through 10/18/2006 1:49:40 PM

MinTime: 1

MaxTime: 14973

MyServer , MyAppPool Avg Uptime: 309 minutes

Count of Recycle Events

VirtualMem: 81

PrivateMem: 6

TimeLimit: 0

PingFail: 11

Admin: 1

FatalComErr 0

TermUnExp 0

ConfChg 1

ISAPI 0

It prints out the date/time of the event, minutes elapsed since the previous event and the reason. Followed by the time span of the events collected (first/last event used). It gives you minimum, maximum and average uptime of the application pool and a count of the various events the query found.

If you see you are recycling based on your time limits, you probably have a pretty healthy app. Looking at the above output we can see the majority of the recycling is due to memory related limits with a few ping fails. This app pool may be have a memory leak, may be using ASP.Net cache excessively or could just have too much content to be contained in a single app pool process (ASPX content is compiled into DLL’s that consume virtual memory), more investigation may be required.