Conor vs. ColumnStore Index Stats

 

I haven’t posted much on the new SQL Server 2012 features yet, but I’m going to start doing so soon.  I’ll also be speaking about them at upcoming conferences. 

We have an exciting new index type in SQL Server 2012 that is column-oriented called the ColumnStore index.  It is a multi-column index that, when combined with other features in SQL Server, can make certain Data Warehouse (DW) queries run significantly faster.  I did a demo of this in Serbia in October (you can go see the slides which are posted on this blog).  Some queries go from minutes to seconds, so the combination of column orientation, compression, and new execution algorithms make queries of this general form faster:

SELECT SUM(col), col2, col3

FROM FactTbl Join Dim1 Join Dim2 …

WHERE …

GROUP BY …

The question I received had to do with whether there is a stats blob for the ColumnStore index.  From one of my example databases, there appears to be one but it is curiously ‘empty’:

clip_image002

Here’s what is happening:

In SQL Server 2012, the ColumnStore index is always nonclustered.  The logic for selecting this is limited to star join query patterns (like the above), which happen to be picked only in complex query scenarios in SQL Server.  Optimizers have an ordering in which they do things – each one is a little different, but you make choices when building one to do things like normalize scalar expressions before picking an initial join order, as an example.  For those who have read other work from me, you’ll occasionally here me talk about this when going into deep dives about the SQL Server Optimizer. 

SQL’s Optimizer considers star join optimizations after doing initial estimates for cardinality.  This means that it determines how many rows are likely to go through each operation in the query before it starts to consider whether to do the star join optimization transformations.  (Most of the stuff you see in the stats blob relates to cardinality estimation).  So SQL’s Optimizer does not need cardinality estimation information from an index-created statistic when you create a ColumnStore index.

That’s great – but why is it still there?  The answer is that we DO use information in the BLOB for costing.  We just don’t (yet) expose anything to see what is there for costing purposes.  So you see a stats entry but it appears empty for now.

In the future we hope to extend this to expose more of the information in the BLOB – for now, we have some extra information that is (almost) peeking through.

I hope that explains the oddity.

Thanks,

Conor Cunningham