A Failed Attempt to Optimize Browsing Performance
1. Introduction
Last post I was mentioning how important it is to have a good planning phase for a technology that you are going to implement it. Planning doesn’t mean only sizing the right hardware for your needs, but also how you will configure the Server to be capable to handle the traffic that your network needs. Sometimes changing the default configuration is necessary, but those changes also need to be planned.
2. Scenario
The example that inspired this post comes from a real scenario where the Firewall Administrator changed the default memory used for cache (which is 10%) to 40% (in a 1GB RAM Server scenario), as showed in Figure 1.
Figure 1 – Default memory used for cache.
As result what was initially done with good intentions (because they want to cache more) caused the Internet access downtime since the server started to run out of resources. The big argument was: the environment was running just fine for at least one year, why just now the issue is happening? There are many reasons for that, such as:
· Amount of users grow over the year.
· Amount of Internet access increased.
· Amount of applications that need to have Internet access increased.
If you don’t know the answer for those questions you will need at least a traffic profile plus a performance baseline of the environment when was working fine to them compare to a scenario that fails.
In this scenario, when the server was running out of resource we got the ISA Data Packager in repro mode plus some other piece of information (such as perfmon). ISABPA Report clearly showed the error, which was:
Figure 2 – ISA BPA did it again.
The most interesting part is that since the server was using more and more virtual memory, the amount of paging failure was huge, which means more disk utilization. Since all (ISA, Cache, Logging and Paging File) were in the same disk we started to see disk queue, as result ISA BPA also alerted us that the logging was failing to write in disk:
Figure 3 – Log Write time Excessive warning.
As the log says, if this pattern continues and the time exceeds 30 seconds ISA Server will go into lockdown mode, which in this case it did.
3. Conclusion
This is typical scenario where the server was initially projected to a certain amount load and the planning to change the memory dedicated to cache was not accurately done. As result a complete Internet access downtime and external access to the published services happened. The conclusion is quiet simple: planning is a key element to have a stable server and best experience with the product that you are implementing.