Part 1: Anatomy of SQL Server 2008 Resource Governor CPU Demo

First, a quick introduction. My name is Boris Baryshnikov and I'm a program manager in SQL Server Database Engine group. As of today, Resource Governor is the most recent large project I worked on. Not surprisingly, I will talk about the Resource Governor here.

While Resource Governor spans multiple components of SQL Server Engine, it seems logical to discuss the feature in this blog as the resource management in general is so close to the scope of SQLOS.

Without further delay, let's get to point. As we have recently released a public CTP (Community Technology Preview) of SQL Server 2008 (aka SQL Server 2008 November CTP), which has Resource Governor functionality, and a number of presentations with Resource Governor demo on various conferences, I started getting a lot of questions about the demo itself which made me write this post.

We created this demo when Resource Governor was only a prototype to illustrate the concept, but probably its simplicity made it so attractive that it was included in a number of talks and demos by different people all over the world.

Background

If you are looking for general concepts about Resource Governor, check out the Books Online which comes as a separate download or read it directly on MSDN Library.

The Demo in a Nutshell

In the demo we will create two workload groups which share a common resource pool and one workload group which has its own dedicated resource pool. Each group gets a CPU intensive workload. Using the Resource Governor you will be able to affect CPU distribution between these competing workloads. The effects of the resource management are observed using performance counters specific to Resource Governor.

Step 1: Initial Demo Setup

On a newly installed server (i.e. no prior Resource Governor configuration) you may need to run these setup steps. I normally doing my demo on a dual core laptop and for the sake of simplicity I'm using a single CPU for SQL Server. To do so, I adjust CPU affinity mask as follows:

-- first enable advanced options in sp_configure

sp_configure 'show advanced', 1

GO

RECONFIGURE

GO

-- use only 1 CPU on demo machine

sp_configure 'affinity mask', 1

GO

RECONFIGURE

GO

 

Using 1 CPU for SQL Server on a dual proc machine has an interesting side-effect: we normalize "CPU usage %" counter to number of CPUs on the box and thus, the values will hover around 50% as maximum and not 100% as you might expect. I will illustrate this below.

Of course you can use both CPUs but this complicates the demo. This is a subject for whole another post - what happens when you have multiple CPUs.

In addition to that for demo purposes I will set min/max server memory to a fixed value, since it will improve predictability of the demo on the laptop.

-- 512 Mb is suitable for laptop demo

sp_configure 'min server', 512

GO

sp_configure 'max server', 512

GO

RECONFIGURE

GO

Step 2: Workload groups and Resource Pools

Now we will be setting up the following hierarchy of workload groups and resource pools:

Resource Governor Group and Pool hierarchy

Each corresponding workload group contains queries of the corresponding class or department (i.e. Marketing, Adhoc, and VP). Note that, Marketing and Adhoc queries share the same resource pool, while workload group VP has its own similarly named pool. The reason of such separation will become clear as we see how we adjust Resource Governor controls.

You will start building the above configuration in a bottom up manner (i.e. starting from pools and going up)

To do so, we execute the following T-SQL:

-- create user pools

-- note that we are using all default parameters

CREATE RESOURCE POOL PoolMarketingAdhoc

CREATE RESOURCE POOL PoolVP

-- create user groups

-- also note that all groups created with default parameters

-- only pointing to the corresponding pools (and not 'default')

CREATE WORKLOAD GROUP GroupMarketing

USING PoolMarketingAdhoc

CREATE WORKLOAD GROUP GroupAdhoc

USING PoolMarketingAdhoc

CREATE WORKLOAD GROUP GroupVP

USING PoolVP

GO

Step 3: Classification

Now, what you have just done is created hierarchy of the groups and pools, however, how does the server know about which query goes where? This is where classification comes in. The above picture becomes:

Group and Pool hierarchy with classification

There is a couple of things:

  1. To do the classification you will need to create a user-defined function that will be executed for every new connection and it will place these new connections in the corresponding workload groups.
  2. How will we separate different connections? For demo purposes we will use 3 separate login names which we will check and use inside of the function

To implement the above 2 steps we will run the following:

-- classifier function should be created in master database

-- switch to master unless you are there already

USE master

GO

-- create logins to separate users into different groups

-- note that we disabled strong password checking for demo purposes

-- but this is against any best practice

CREATE LOGIN UserMarketing WITH PASSWORD = 'UserMarketingPwd', CHECK_POLICY = OFF

CREATE LOGIN UserAdhoc WITH PASSWORD = 'UserAdhocPwd', CHECK_POLICY = OFF

CREATE LOGIN UserVP WITH PASSWORD = 'UserVPPwd', CHECK_POLICY = OFF

GO

-- now create the classifier function

IF OBJECT_ID('DBO.CLASSIFIER_V1','FN') IS NOT NULL

       DROP FUNCTION DBO.CLASSIFIER_V1

GO

-- note that this is just a regular function

CREATE FUNCTION CLASSIFIER_V1 ()

RETURNS SYSNAME WITH SCHEMABINDING

BEGIN

       DECLARE @val varchar(32)

       SET @val = 'default';

       if 'UserVP' = SUSER_SNAME()

              SET @val = 'GroupVP';

       else if 'UserMarketing' = SUSER_SNAME()

              SET @val = 'GroupMarketing';

       else if 'UserAdhoc' = SUSER_SNAME()

              SET @val = 'GroupAdhoc';

       return @val;

END

GO

Step 4: Are we there yet?

After all this work, can we start workloads and see what happens? The answer, as you have guessed by the question is - no. What's left? Again, a couple of steps:

  1. We need to tell Resource Governor to use the function that we just created
  2. Make all the changes effective

First step is done by

-- make function known to the Resource Governor

ALTER RESOURCE GOVERNOR

WITH (CLASSIFIER_FUNCTION = dbo.CLASSIFIER_V1)

GO

For the second step, let's compare output of catalog views with in-memory information (note difference in names of catalog views and dynamic management views (DMVs) which are prefixed with dm_:

-- metadata information

SELECT * FROM sys.resource_governor_workload_groups

SELECT * FROM sys.resource_governor_resource_pools

SELECT * FROM sys.resource_governor_configuration

 

-- in-memory information

SELECT * FROM sys.dm_resource_governor_workload_groups

SELECT * FROM sys.dm_resource_governor_resource_pools

SELECT * FROM sys.dm_resource_governor_configuration

Now transfer changes from metadata to memory by running the following statement. Also, do not confuse it with already existing RECONFIGURE command:

-- make the changes effective

ALTER RESOURCE GOVERNOR RECONFIGURE

GO

And rerun the above query on metadata and DMVs and you should see that new groups, pools and classifier function ID are present in corresponding DMVs.

Step 5: Running the workloads

The easiest way to simulate a CPU intensive workload is to run the following in a loop:

set nocount on

declare @i int

declare @s varchar(100)

 

set @i = 100000000

 

while @i > 0

begin

       select @s = @@version;

       set @i = @i - 1;

end

Also, instead of running this query from the Management Studio, consider saving it in a file and running from a command prompt by using a script similar to the below. Note that we are using 3 different user names to connect to the server.

echo "Press any key to start Marketing workload"

pausestart sqlcmd -S <your_server_name> -U UserMarketing -P UserMarketingPwd -i "CPU intensive loop.sql"

echo "Press any key to start VP workload"

pause

start sqlcmd -S <your_server_name> -U UserVP -P UserVPPwd -i "CPU intensive loop.sql"

echo "Press any key to start Adhoc workload"

pause

start sqlcmd -S <your_server_name>" -U UserAdhoc -P UserAdhocPwd -i "CPU intensive loop.sql"

To observe the effects of the load, add the following performance counters in the perfmon:

  1. We will monitor CPU usage per group in the 1st instance of perfmon; add "SQLServer:Workload Group Stats object", "CPU usage %" counter for "GroupMarketing", "GroupAdhoc" and "GroupVP" instances
  2. We will monitor CPU usage per pool 2nd instance of perfmon, add "SQLServer:Resource Pool Stats object", "CPU usage %" counter for "PoolMarketingAdhoc" and "GroupVP" instances

Before you start the next workload, observe the counters for pools and groups for a number of seconds, you should see approximately the following:

For groups (click on the image to open in a new window):

CPU usage % counters for groups

For pools:

CPU usage % for pools

Few things to note:

  1. Remember I set affinity mask to 1? This is why the maximum on the figures is only 50% (it is for a single CPU while usage is normalized to all CPUs on the machine, 2 in my case)
  2. At point A, I started Marketing workload, CPU usage went to the maximum on the CPU for a single workload, 50% in this case
  3. At point B, I started VP workload, which, as you remember, belongs to a separate resource pool and it made Marketing group share half of the CPU with VP group. The same is true for the pools (because we have 1 to 1 match of active requests in groups to pools at this point)
  4. At point C, I started Adhoc workload, which has its own group but shares the pool with Marketing workload. What happened here on the groups is that all 3 of them are now sharing the CPU getting approximately 1/3 of it or roughly 17%. On the pool side, however, we can see that MarketingAdhoc pool which shows aggregate resource usage by all groups inside of it, has 2/3 or the CPU while PoolVP has only 1/3. This behavior is exactly what you get on SQL Server 2005. Resources are distributed as they are being requested and whoever needs more CPU simply gets it. This was one of the goals: to make behavior as close as possible to SQL Server 2005 when you do not use the Resource Governor or adjust any parameters of it. Note, however, by creating the groups and pools you are already slightly altering the behavior of SQL Server 2008, but more on this in the next few posts.
Step 6: Management Actions

Now we came to the point where we want to apply action to change the above picture. Specifically, we want our VP workload to proceed faster and thus, limit CPU usage by Marketing and Adhoc workloads to 50% of the CPU.

To do this, we alter the PoolMarketingAdhoc using the following syntax (remember, we created the pool using all default parameters):

-- adjust PoolMarketingAdhoc to not consume more than 50% of CPU

ALTER RESOURCE POOL PoolMarketingAdhoc

WITH (MAX_CPU_PERCENT = 50)

Remember to make changes effective:

-- make the changes effective

ALTER RESOURCE GOVERNOR RECONFIGURE

Now, let's look at the counters:

For groups:

CPU usage % for groups after management actions

For pools:

CPU usage % for pools after management actions

What you will see is what happened at point D: Pool usage by PoolMarketingAdhoc went back to half of the CPU while PoolVP took the remaining part of it. Note that, on group side GroupVP usage went up to use half of the CPU (it has a single workload) while groups Marketing and Adhoc divide another half equally (12.5% each).

Further step is to alter IMPORTANCE parameters of the Marketing and Adhoc groups. IMPORTANCE affects CPU distribution when two groups share the same pool. Value of importance has a numeric meaning for CPU bandwidth distribution. Ratio of Low:Medium:High IMPORTANCE is equal to 1:3:9.

To illustrate this, let's run the following:

-- alter importance of adhoc group

ALTER WORKLOAD GROUP GroupAdhoc

WITH (IMPORTANCE = Low)

 

-- alter importance of marketing group

ALTER WORKLOAD GROUP GroupMarketing

WITH (IMPORTANCE = High)

 

-- make the changes effective

ALTER RESOURCE GOVERNOR RECONFIGURE

When changes become effective corresponds to the point E on the above graphs. Note that distribution between pools does not change, amount of CPU available to GroupVP also does not change (there are no other groups in this pool), but GroupMarketing takes about 10 times of the bandwidth available to GroupAdhoc (remember the ratio of Low:High = 1:9, but together they stay within PoolMarketingAdhoc limits.

 

That's enough for the first post. In the upcoming post I will try to go over the cases when the demo does not seem to work or shows "unusual" behavior and possible explanations of it.