Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Here are some notes on "SQL Server 2008 Statistics" I took while attending an advanced class on SQL Server taught by Kimberly Tripp (https://sqlskills.com/AboutKimberlyLTripp.asp).
Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.
Understanding the job of the cost estimator
- Table Scan (always an option)
- Useful Indexes (look at WHERE, JOIN, GROUP BY, ORDER BY, covering)
- Other options?
Statistics
- Statistics = density information about a table kept by SQL
- Help SQL "estimate" how many rows a certain query will return
- Stored as a BLOB, but you can query the data using "DBCC SHOW_STATISTICS"
Not always right
- Statistics cannot be right for every case, or else it would be actual data
- By definition, it is not all the data, it's a sample.
- How frequently do you update? Depends on how the data is changing.
- Statistically, certain types of data are consistent over time.
- Unless that are some atypical events related to your data.
Could be statistically correct after update
- Look at the city column for an employee table for a company based in Seattle
- Think of the frequency for Seattle, Redmond (suburb) and Spokane (hours away)
- This would probably not change much (statistically) quickly
- High number of rows with low selectivity - this is harder.
Looking at stats
- Things to look in your statistics (DBCC SHOW_STATISTICS):
- How old are your statistics?
- How many rows sampled in the statistics (out of the total rows)?
- How many steps in the statistics (up to 201 steps in SQL Server 2008)?
- Steps = number of rows in your histogram.
- What is the density of that key?
- Density * # of Rows = average rows returned for a given value
- You also get the average length of the column.
- For every step: actual value (high for step), total rows, count of unique rows.
- Data could be sampled (not based on every row, but just a subset).
AUTO
- Generating statistics is usually not so expensive, it's done with sampling
- The query optimizer will actually create statistics on the fly, as queries come in.
- Future queries will benefit from having created that.
- If you don't have the stats, SQL will not be sure of the usefulness of an index.
- AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS are ON by default.
Stats with no index?
- Statistics do make sense in cases when you don't have indexes on a column
- For instance, for helping with deciding how to do sorts
- For instance, in a JOIN, to figure out which side has a smaller set of rows.
Let SQL do it
- Let SQL Server do its thing with stats
- You may start with a query taking a long time, than it improves "by itself"
- This could be the results of statistic being autogenerated and helping you
- Careful: On Read-only database, SQL cannot create statistics
Pre-create statistics
- You can also pre-create the statistics
- You could pre-create for every column of every table, but you probably shouldn't
- It probably helps generate them for every column of your non-clustered indexes
- You can do that using "sp_createstats 'indexonly', 'fullscan'"
Leave it on
- Leave autocreate/autoupdate on. It's fine in most cases
- For most cases, you better off leaving them on
- If you know what you're doing, you can manage it manually
- You can still leave it on, since it will do only what's missing
- In that case, use STATISTICS_NORECOMPUTE
- Having one specific "problem child" database with scheduled updates with full scans.
- Details at https://www.sqlskills.com/BLOGS/KIMBERLY/post/Auto-update-statistics-and-auto-create-statistics-should-you-leave-them-on-andor-turn-them-on.aspx
Stats get old
- After they are created, statistics will get old over time, as data gets updated
- You can use sp_updatestats to query
- You can also query sys.stats and sys.indexes
- You can use DBCC SHOW_STATISTICS (with HISTOGRAM?)
Auto updates
- Statistics are automatically updated if AUTO_UPDATE_STATISTICS are ON (default)
- They get updated when about 20% of the data changes
- In SQL 7, tracks row mod, immediately updates
- In SQL 2000, tracks row mod, updates when needed
- In SQL 2005, tracks column mod, updates when needed
- In SQL 2008, tracks column mod, updates when needed. New: Filtered stats.
Async updates
- Even if stats need updating, you can use old stats if new ones are not ready
- To do that, use ALTER DATABASE dbname AUTO_UPDATE_STATISTICS_ASYNC ON
- Use with caution
Sampling
- Sampling is usually OK, but could create issues if data is not evenly distributed
- You might miss something significant if your sampling does not give you enough
- You can disable autoupdate and update statistics manually using a full scan
- To do that "UPDATE STATISTICS... WITH FULL SCAN".
Very large sets
- For very large data sets, even a full scan (no sampling) might not be enough.
- Think a few billion sales records, looking at statistics for customer_number.
- In that case, you have 1000 large customers and 1 million small ones.
- With only 201 steps, you just can't get good statistics for both large and small.
- For those, consider partitioning into multiple tables to get better stats.
- In SQL 2008, consider filtered statistics (for instance, WHERE customer_type=Large).
- Automatically created for filtered indexes. Can be created manually.
Conclusion
- Statistics will only help
- However, it is always better to have the right indexes
- As usual, you can't optimize for every single query
- Pick your battles.
For details, see "Statistics Used by the Query Optimizer in Microsoft SQL Server 2005" at:
https://technet.microsoft.com/en-us/library/cc966419.aspx