Limiting resource consumption on servers.

In my last blog entry I described the problem of servers over committing resources. There are several solution to this problem. The simplest and most common is to make sure that the critical resource is easily and smoothly controlled by the host OS: CPU time. If the server load is CPU bound, the other resources won’t be a problem. IT operation groups know how to monitor and load balance servers based on CPU utilization.

A much more sophisticated but general solution is based on controlling the inbound processing rate of requests. It requires to monitor *all* of the resources which can become scarce (memory, database connections, etc..) and start throttling the processing of incoming requests when any of these resources become close to their maximum. This can be accomplished by placing incoming requests into a staging queue where they will be picked up and processed. The staging queue itself can be maintained using very few resources from the server - some memory, very little CPU, and pretty much nothing else because the request is left into an un-processed state when it is queued up. The staging queue length needs to be regulated: Ideally, it should be very small when the server is running under optimal load and it should be constant under steady state. It should be allowed to grow when resources get scarce because of an over commitment. The idea is that this over commitment is going to be resolved when the few “abnormal” requests have been fully processed.

This approach works pretty well in the situations where I have seen it deployed, as long as the server process we are talking about is not competing for the resources with another process on the same host. I have seen cases where a server is doing great and suddenly the available physical memory drops on the machine because of a competing program. The server goes into throttling mode but it isn’t effective enough to prevent paging because the amount of memory it has committed cannot decrease as fast as the demand from the competing process. When the server starts paging, things get into a standstill because most of the memory pages are hot, containing data for the processing of the requests. I have seen people taking drastic measures to make up for this: They either abandon the semi computed state for some requests and put them back on the staging queue, or they serialize the semi processed state onto disk and put the request onto another queue, to be processed once the scarce situation is over. Serializing the semi processed state is difficult because it requires the developer to know exactly for each request what needs to be serialized. There could be an issue with shared data between requests. Another problem with serialization is that it uses disk resources which maybe scarce already and it may compete with the OS trying to page memory out to disk.

I recommend trying out or simulating the basic staging queue approach before going to more sophisticated approaches. The pop3 server which I described in my previous arcticle did fine after they implemented the staging queue approach: It normally ran with 100MB of physical memory load; when the physical memory went above 200MB, it stopped processing incoming requests, which queued up on the staging queue. The staging queue had a maximum size of 4000 entries. It is interesting to note that the measured throughput varies more because it decreases when the staging queue is growing but it does not crash anymore!