Microsoft SQL Server Performance Top Tip: Degrees of Parallelism and a Degree of Uncertainty
Max degree of parallelism is an advanced configuration option in SQL Server about which many question have been asked and much written. In this blog entry I hope to offer some insight and certainty as to what this option does and how it should be used.
Firstly, I must dispel any idea you might have, that this setting has anything to do with how many processors SQL Server can use when trying to service more than one connection (or user) – it hasn’t. If SQL Server has access to four idle processors and it has four processes to run, it will use all four processors – regardless of the max degree of Parallelism.
So what does this option do? This option sets the maximum number of processors SQL Server can use for one query. If SQL Server has to bring back a lot of data (lots of rows) for a query it sometimes makes sense to break the query down into a number of smaller queries, each query returning a subset of the total rows. This way SQL Server can make use of more than processor and hence on multi processor boxes it can potentially return a lot of rows more quickly, for a given query, than it can on a single processor box.
There are a number of criteria that have to be met before SQL Server will invoke ‘Intra Query Parallelism’ (breaking the query down into more than one stream), there’s no point in detailing them here you can find them in BOL under ‘Degree of parallelism’. Suffice to say the decision is based on the available memory on the box and most importantly the availability of the processors.
So why do we need to think about this option – well because leaving it on the default (letting SQL Server decide) can sometimes give you undesirable effects. These effects are:
- Queries may go slower with parallelism.
- Query time for a particular query may become non-deterministic and this can annoy users. The time may vary because:
- The query may sometimes go parallel and sometimes it may not.
- The query may get blocked by a parallel query in front of it hogging the processors.
Before we go on, I don’t think there’s any real benefit in diving into the internals of parallelism, if you’re interested you can look at ‘Parallel Query Processing’ in Books on Line for more detailed information. I think there are only two important things to know about the internals of parallelism:
1. Parallel queries may spawn many more threads than specified in the Max degree of parallelism option. DOP 4 could allow for 12+ threads, four for the query and the extra threads being used for sorts, threads, aggregates and gathering etc.
2. Parallel queries can cause spids to wait with a waittype of CXPACKET or 0X0200
So armed you can now identify spids waiting parallel operations as they will have a waittype of CXPACKET in sysprocesses. To help you here you can use the stored procs in my blog CREATE proc track_waitstats & CREATE proc get_waitstats - useful code for diagnosing SQL 2000 performance
So ‘Queries may go slower with parallelism’ how come?
- If the system has a very poor data throughput on the disk subsystem(s) then query decomposition is unlikely to go faster.
- Possible skew of data, or possible lock in a data range for a cpu, meaning one parallel process is behind, etc.
- If there is a missing an index for a predicate that results in a table scan. A parallel query operation may hide the fact that the query would run much faster as a serial plan with an appropriate index.
Consequently it is recommend you test without parallelism (DOP=1) so these issues can be identified.
The other effects of parallelism mentioned above should be self explanatory and should lead you to the idea that intra query parallelism is not really appropriate for OLTP applications. These are apps where varying query time can annoy users and where a server servicing a number of concurrent users is unlikely to choose a parallel plan because of the cpu workload profile.
So if you are going to employ parallelism it is most likely going to be in a data warehouse, decision support or reporting scenario, where there are a low number of high workload queries on a big fat box with plenty of memory.
So you have decided to employ parallelism – what should you set DOP to. Well one guideline that crops up internally is that if you have 8 processors then DOP = 4 is likely to be the optimal setting. However there is no mast and no flag pinned to it here – testing is the only way to be sure. Further to this I have not seen any other figures, but having said all that, my advice would be to never set this figure to anything more than half the number of processors I have. If I have less than 6 processors I would set DOP to 1 - which doesn't allow any parallelism to take place. I might make an exception if I had a database that was only going to support a single user process (some sort of data warehousing or reporting task) – here the exception would be to set DOP to 0 (default) which lets SQL decide.
Before I finish my ramble – beware that parallel index creation will depend on the figure you set for DOP. This means you may want to change it at times to suit what you’re doing and of course you can use the MAXDOP query hint to override the configuration option for those out of hours jobs.
Finally your query may go slower using parallelism due to a bug – make sure you have the latest service pack.