Understanding Segment Elimination
We've already given an overview of SQL Server columnstore segment (rowgroup) elimination in the fundamentals section. To illustrate further, if you examine sys.column_store_segments, you'll see that the minimum and maximum values are stored for each column segment. The important columns in this table for the purposes of understanding segment elimination are: column_id, segment_id, min_data_id, and max_data_id.
The min_data_id and max_data_id columns identify the minimum and maximum values in segment segment_id for column column_id. We say "identify" because these min and max values may either be literal values, or they may identify those values in some other way, such as with a reference into a dictionary. Assume they actually contain the values for purposes of this discussion. Segment elimination works for number, date, and time data types. Strings aren't supported for segment elimination in SQL Server 2012 RTM.
Suppose there is a table Purchase(Date, ...) where Date is an int column containing date keys of the form YYYYMMDD. Assume the following information is maintained for the Date column:
column_id | segment_id | min_data_id | max_data_id |
1 | 1 | 20120101 | 20120131 |
1 | 2 | 20120115 | 20120215 |
1 | 3 | 20120201 | 20120228 |
While scanning the columnstore index for the following query, the SQL Server storage engine would recognize by looking at the information above that only segments 2 and 3 could possibly contain qualifying rows.
select Date, count(*)
from dbo.Purchase
where Date >= 20120201
group by Date
Segment 1 would be skipped. Its data wouldn't be read from disk or scanned.
SQL Server can even skip segments using join operations. This is especially useful when joining a fact table that has a columnstore index with a Date dimension table. For example, consider this query:
select p.Date, count(*)
from dbo.Purchase p, dbo.Date d
where p.Date = d.DateId
and d.Year = 2012
and d.Month = 2
group by p.Date
This query will also skip segment 1. In general, segments can be skipped for joins with the Date dimension for simple ranges, or multiple non-overlapping ranges. For example, segments can be skipped for a pattern that identifies Date dimension rows for Saturdays and Sundays over a one year period. Moreover, segments can be skipped based on combinations of joins on multiple columns and direct filter predicates on multiple columns.
In the past, there have been situations where partitions could have been skipped by SQL Server, but they ended up being scanned anyway. This problem was common when trying to do date range elimination via a hash join between a fact able and and a Date dimension. Now, with columnstores, each partition will be composed of one or more segments. Even if a partition is examined during query processing, the segments within it can be skipped if no data in the partition qualifies. So columnstore query processing can effectively skip partitions if they contain no data that qualifies.