Share via

Introduction to New Data Warehouse Scalability Features in SQL Server 2008

SQL Server Technical Article


Writers:  Eric N. Hanson, Kevin Farlee, Stefano Stefani, Shu Scott, Gopal Ashok, Torsten Grabs, Sara Tahir, Joachim Hammer, Sunil Agarwal, T.K. Anand, Richard Tkachuk, Catherine Chang, and Edward Melomed, Microsoft Corp.

Technical Reviewer: Eric N. Hanson, Microsoft Corp.

Published: December 2007

Applies To: SQL Server 2008


Summary: With the 2008 release, SQL Server makes a major advance in scalability for data warehousing. It meets the data warehouse needs of the largest enterprises more easily than ever. SQL Server 2008 provides a range of integrated products that enable you to build your data warehouse, and query and analyze its data. These include the SQL Server relational database system, Analysis Services, Integration Services, and Reporting Services. This paper introduces the new performance and manageability features for data warehousing across all these components. All these features contribute to improved scalability.


Microsoft® SQL Server™ 2008 provides a comprehensive data warehouse platform. It enables you to build and manage your data warehouse, and deliver insight to your users, with a single, integrated product suite. It scales to meet the needs of the largest enterprises, in a way that empowers both your end users and your IT staff.

The number one focus of development in the SQL Server 2008 release was to improve scalability across the entire product suite to comfortably meet the needs of large enterprises. Here, we’ll introduce the features and enhancements we’ve added to improve your data warehouse experience. Build. Manage. Deliver. SQL Server 2008 lets you do it all, with ease.

Map of New Data Warehousing Features

The following table shows the new scalability features in SQL Server 2008, and where they help with the activities that surround your data warehouse (DW).





Deliver Insight

SQL Server Relational DBMS

MERGE statement

Change data capture (CDC)

Minimally logged INSERT

Backup compression

Star join performance

Faster parallel query on partitioned tables


Resource governor

Data compression

Partition-aligned indexed views

Integration Services

Lookup performance

Pipeline performance



Analysis Services



MDX Query Performance: Block Computation

Query and Writeback Performance

Scalable Shared Database

Reporting Services


Reporting scalability

Server scalability


Table 1: SQL Server 2008 improvements

This white paper briefly describes the data warehousing enhancements in each of the different components of SQL Server 2008, and how they help you get the most from your data warehouse. For full details on how to use these features, see SQL Server 2008 Books Online (BOL).

SQL Server Relational DBMS DW Improvements

The SQL Server 2008 relational DBMS contains significant advances over earlier releases, so that it performs better when you create, manage, and query large data warehouses. This section elaborates on the relational DBMS data warehouse improvements listed in Table 1.

Star Join

With dimensionally modeled data warehouses, a big part of your workload typically consists of what are known as star join queries. These queries follow a common pattern that joins the fact table with one or several dimension tables. In addition, star join queries usually express filter conditions against the non-key columns of the dimension tables and perform an aggregation (typically SUM) on a column of the fact table (called a measure column). With SQL Server 2008, you will experience significant performance improvements for many star join queries that process a significant fraction of fact table rows.

The new technology employed is based on bitmap filters, also known as Bloom filters (see Bloom filter, Wikipedia 2007, It allows SQL Server to eliminate non-qualifying fact table rows from further processing early during query evaluation. This saves a considerable amount of CPU time compared to query processing technologies used by competing products. While your results may vary, we’ve typically seen entire relational data warehouse query workloads experience performance improvements of 15-25% when using the new star join query processing capability. Some individual queries speed up by a factor of seven or more.


Figure 1: Star join query plan with join reduction processing for efficient DW

The new star join optimization uses a series of hash joins, building a hash table for each dimension table that participates. As a byproduct of building this hash table, additional information, called a bitmap filter, is built. Bitmap filters are represented as boxes in Figure 1, labeled “Join Reduction Info.” These filters are pushed down into the scan on the fact table, and effectively eliminate almost all the rows that would be eliminated later by the joins. This eliminates the need to spend CPU time later copying the eliminated rows and probing the hash tables for them. The illustration shows the effect of this filtering within the fact table scan. The SQL Server 2008 query executor also re-orders the bitmaps during execution, putting the most selective one first, then the next most selective one, and so forth. This saves more CPU time, because once a fact table row fails a check against a bitmap, the row is skipped.

The new star join optimization is available in Microsoft SQL Server 2008 Enterprise Edition. The query processor in SQL Server applies the optimization automatically to queries following the star join pattern when this is attractive in terms of estimated query cost. You do not need to make any changes to your application to benefit from this significant performance improvement.

Improved Parallelism

Wouldn’t you like to get the most power you can out of the hardware you own? The partitioned table parallelism (PTP) and the few outer rows parallelism features in SQL Server 2008 help you do that.

Partitioned Table Parallelism

Data warehouse applications typically collect large amounts of historical data in fact tables, which are often partitioned by date. In SQL Server 2005, queries that touch more than one partition use one thread (and thus one processor core) per partition. This sometimes limits the performance of queries that involve partitioned tables, especially when running on parallel shared memory multiprocessor (SMP) computers with many processor cores. Partitioned table parallelism improves the performance of parallel query plans against partitioned tables by better utilizing the processing power of the existing hardware, regardless of how many partitions a query touches. The feature works by default without the need for manual tuning or configuration. The following figure illustrates the impact of partitioned table parallelism in a typical data warehouse scenario.


Figure 2: Partitioned table parallelism.

Assume that we have a fact table representing sales data organized by sales date across four partitions, each containing seven days of data, as shown in the top portion of the figure. Query Q summarizes sales over seven days. The query can touch different partitions depending on when it is executed. This is illustrated by query Q1, which touches a single partition P2 and by Q2, which touches two partitions since the relevant data at the time of execution spans P3 and P4.

*Executing Q1 and Q2 in SQL Server 2005 may generate some unexpected behavior. Because there is special-case logic that can allocate all threads to a single-partition query, Q1 results in a parallel plan involving P3 that is processed by all available threads (execution not shown in the figure). In the case of Q2, however, the executor assigns a single thread each to partitions P3 and P4 (see Old Allocation in the figure) even if the underlying hardware has additional threads available. Hence on an 8-way computer, Q2 utilizes only 2/8 (25%) of the available CPU power and very probably executes much slower than Q1. *

Executing Q1 and Q2 in SQL Server 2008 results in better utilization of the available hardware, and thus in better performance as well as more predictable behavior. In the case of Q1, the executor again assigns all available threads to process data in P2 (not shown). Q2 results in a parallel plan in which the executor assigns all available threads to both P3 and P4 round-robin style, producing the effect illustrated in the figure under New Allocation. The CPU remains fully utilized and the performance of Q1 and Q2 are comparable. Under this new round-robin allocation of threads, the performance boost provided by partitioned table parallelism becomes more pronounced the more processor cores there are compared to the number of partitions affected by a query. When all data accessed by a query is in the main memory buffer pool, which is typical for the most recent partitions, we’ve observed speedups of 16 times or more in internal tests for queries that touch two partitions. Actual results depend on the query, data organization, and hardware configuration.

For details on the thread allocation strategy as well as on the manageability features of partitioned table parallelism, see SQL Server 2008 Books Online.


Few Outer Rows Parallelism

SQL Server 2008 enables parallelism for nested loop joins even when the outer side of the join has only a few rows.  In SQL Server 2005, if multiple threads are available, each thread is allocated a page of rows from the outer side of the join.  If there are only a few rows, they are likely to be on the same page.  In such cases, only one thread is employed and the potential benefits of parallelism are lost.  SQL Server 2008 will recognize such cases and introduce an exchange operator that allocates one row per thread so that all available CPUs are employed.  The increased parallelism means that CPU consumption will increase temporarily as compared to SQL Server 2005, but query execution will be faster.  This new behavior is only seen if the number of outer rows is small and if the cost of the query is estimated to be large enough to benefit from the additional parallelism.  If the query cost is estimated to be small, or if the cardinality estimate for the outer side is > 1000, SQL Server will allocate one page per thread as in SQL Server 2005.  Additional detail about exchange operators and parallel query processing is available in Books On Line at Database Engine > Planning and Architecture > Query Processing Architecture > Parallel Query Processing.

Partition-Aligned Indexed Views

Partition-aligned indexed views enable you to create and manage summary aggregates in your relational data warehouse more efficiently, and use them in scenarios where you couldn’t effectively use them before, improving query performance. In a typical scenario, you have a fact table that is partitioned by date. Indexed views (summary aggregates) are defined on this table to help speed up queries. When you switch in a new table partition, the matching partitions of the partition-aligned indexed views defined on the partitioned table switch too, and do so automatically.

This is a significant improvement over SQL Server 2005, where you must drop any indexed views defined on a partitioned table before using the ALTER TABLE SWITCH operation to switch a partition in or out of it. The partition-aligned indexed views feature in SQL Server 2008 gives you the benefits of indexed views on large partitioned tables, while avoiding the cost of rebuilding aggregates on an entire partitioned table. These benefits include automatic maintenance of aggregates, and indexed view matching (automatic query rewrite to use the aggregates to solve queries that refer only to the base tables, not the aggregates). For more details on indexed views, see Improving Performance with SQL Server 2005 Indexed Views on Microsoft TechNet.

The following figure shows how aggregates move with base table partitions when switching in a partition.


Figure 3: Partition-aligned indexed views


GROUPING SETS allow you to write one query that produces multiple groupings and returns a single result set. The result set is equivalent to a UNION ALL of differently grouped rows. By using GROUPING SETS, you can focus on the different levels of information (groupings) your business needs, rather than the mechanics of how to combine several query results. GROUPING SETS enables you to write reports with multiple groupings easily, with improved query performance.

In this simple but typical example, using the AdventureWorksDW sample database, you may want to see the following aggregates for a specific reporting period:

·         Total sales amount by quarter and country

·         Total sales amount by quarter for all countries

·         The grand total

To get this result without GROUPING SETS, you must either run multiple queries or if one result set is desired, use UNION ALL to combine these queries. With GROUPING SETS, your query can be expressed like this:

SELECT   D.CalendarYear, D.CalendarQuarter, T.SalesTerritoryCountry

      , SUM(F.SalesAmount) AS SalesAmount

FROM  dbo.FactResellerSales F      

      INNER JOIN dbo.DimTime D ON F.OrderDateKey = D.TimeKey

      INNER JOIN dbo.DimSalesTerritory T ON

         F.SalesTerritoryKey = T.SalesTerritoryKey

WHERE D.CalendarYear IN (2003,2004)


        (CalendarYear, CalendarQuarter, SalesTerritoryCountry)

      , (CalendarYear, CalendarQuarter) 

      , () )

ORDER BY D.CalendarYear, D.CalendarQuarter, T.SalesTerritoryCountry

Typically, you display the result of this query as a type of pivot table, like this:




Period Totals





Year, Quarter

























Grand Total:



Table 2: Output from a GROUPING SETS query, formatted as a pivot table

As the number of possible groupings increases, the simplicity and performance benefits provided by GROUPING SETS become even greater.


The MERGE statement allows you to perform multiple Database Manipulation Language (DML) operations (INSERT, UPDATE, and DELETE) on a table or view in a single Transact-SQL statement. The target table or view is joined with a data source and the DML operations are performed on the results of the join. The MERGE statement has three WHEN clauses, each of which allows you to perform a specific DML action on a given row in the result set:

·         For every row that exists in both the target and the source, the WHEN MATCHED clause allows you to UPDATE or DELETE the given row in the target table.

·         For every row that exists in the source but not in the target, the WHEN NOT MATCHED [BY TARGET] clause allows you to INSERT a row into the target.

·         For every row that exists in the target but not in the source, the WHEN NOT MATCHED BY SOURCE clause allows you to UPDATE or DELETE the given row in the target table.

You can also specify a search condition with each of the WHEN clauses to choose which type of DML operation should be performed on the row. The OUTPUT clause for the MERGE statement includes a new virtual column called $action, which you can use to identify the DML action that was performed on each row.

In the context of data warehousing, the MERGE statement is used to perform efficient INSERT and UPDATE operations for Slowly Changing Dimensions (SCD) and to maintain the fact table in various common scenarios. The MERGE statement has better performance characteristics than running separate INSERT, UPDATE, and DELETE statements since it only requires a single pass over the data.

SQL Server 2008 also includes a powerful extension to the INSERT statement that allows it to consume rows returned by the OUTPUT clause of a nested INSERT, UPDATE, DELETE, or MERGE statements.

Suppose you have a DimBook table (ISBN, Price, IsCurrent) that tracks the price history and current price for each book in a bookstore. Price changes and new book additions are made on a weekly basis. Every week a source table WeeklyChanges (ISBN, Price) is generated and these changes are applied to the DimBook table. A row is inserted for each new book. Existing books whose prices have changed during the week are updated with IsCurrent=0 and a new row is inserted to reflect the new price. The following single Transact-SQL statement performs these operations using the new MERGE and INSERT capabilities.

INSERT INTO DimBook(ISBN, Price, IsCurrent)

    SELECT ISBN, Price, 1



        MERGE DimBook as book

        USING WeeklyChanges AS src

        ON (book.ISBN = src.ISBN and book.IsCurrent = 1)


            UPDATE SET book.IsCurrent = 0


            INSERT VALUES (src.ISBN, src.Price, 1)

        OUTPUT $action, src.ISBN, src.Price

    ) AS Changes(action, ISBN, Price)

    WHERE action = 'UPDATE';

Change Data Capture

Change Data Capture (CDC) is one of the new data tracking features introduces in SQL Server 2008. Mainly designed for data warehousing scenarios, Change Data Capture provides an efficient mechanism to track and gather data changes made to user tables and gives you access to change data in an easily consumable relational format. Typically, you use CDC in an operational database to capture changes for later movement into your data warehouse. The availability of CDC in SQL Server eliminates the need for using intrusive methods such as user triggers, timestamp columns, and expensive queries to determine what changed in the operational system.

The auxiliary information gathered along with the change data allows CDC to provide answers to a variety of questions. For example, here is a set of questions for which CDC can provide the answers efficiently:

·         I want all rows that changed between 12:00 A.M. and 12:00 P.M.

·         I need to know whether a change is an insert, update, or delete.

·         For an updated row, I want to know which column(s) changed.

One of the scenarios in which CDC can be extremely beneficial is Extract, Transform and Load (ETL). With the explosion of data volume and shrinking maintenance windows due to global operations, it is critical to optimize the ETL process. Change Data Capture provides you with a very efficient way to extract changes on an incremental basis, reducing overall ETL processing time.

The following diagram provides an overview of the components that make up Change Data Capture.


Figure 4: Change Data Capture

CDC uses a capture job to extract changes from the SQL Server transaction log, and populate change tables. The CDC API allows you to write an application to get information from change tables. You can use this in your ETL packages. The CDC cleanup job removes information that is no longer needed from change tables.

Minimally Logged INSERT

In general, when you write data to a user database, you must write it to disk twice: once to the log, and once to the database itself. This is because the database system uses an undo/redo log so it can rollback or redo transactions when needed. But it’s possible to write the data to disk only once in some important cases that involve inserting data into existing tables, thereby speeding up your ETL processes significantly. This is what the new minimally logged INSERT feature does in SQL Server 2008. A factor of two or more speedup is common with minimal logging compared with full logging. Your results will depend on your application and hardware.

Minimal logging consists of logging only the information that is required to rollback the transaction without supporting point-in-time recovery. Minimal logging is only available under the bulk logged and simple recovery models. When a minimally logged transaction is committed, the log records and the data pages containing the newly inserted rows are flushed to the disk to guarantee the durability of the transaction. Minimal logging greatly improves the performance of large scale INSERT operations by reducing the number of log records to be written and the amount of log space required to complete the operation. For a discussion of table requirements for minimal logging, see SQL Server Books Online. In particular, you must use table locking (TABLOCK) on the target table.

Operations that can be minimally logged in SQL 2005 include bulk import operations, SELECT INTO, and index creation and rebuild. SQL 2008 extends the optimization to INSERT INTO…SELECT FROM T-SQL operations that insert a large number of rows into an existing target table when that table is a heap that has no nonclustered indexes, and the TABLOCK hint is used on the target. The optimization works whether the target table is empty or contains data.

A key scenario for using minimally logged INSERT is this: you create an empty table on specific file groups, so you can control where the data is physically placed. Then you use INSERT INTO…SELECT FROM to populate it, in a minimally logged fashion. This puts the data where you want it, and only writes it to disk once. Once the data is loaded, you can then create the required indexes. It is important to note that indexes themselves can be created with minimal logging.

Data Compression

The new data compression feature in SQL Server 2008 reduces the size of tables, indexes or a subset of their partitions by storing fixed-length data types in variable length storage format and by reducing the redundant data. The space savings achieved depends on the schema and the data distribution. Based on our testing with various data warehouse databases, we have seen a reduction in the size of real user databases up to 87% (a 7 to 1 compression ratio) but more commonly you should expect a reduction in the range of 50-70% (a compression ratio between roughly 2 to 1 and 3 to 1).

SQL Server provides two types of compression as follows:

·         ROW compression enables storing fixed length types in variable length storage format. So for example, if you have a column of data type BIGINT which takes 8 bytes of storage in fixed format, when compressed it takes a variable number of bytes—anywhere from 0 bytes to up to 8 bytes. Since column values are stored as variable length, an additional 4‑bit length code is stored for each field within the row. Additionally, zero and NULL values don’t take any storage except for the 4‑bit code.

·         PAGE compression is built on top of ROW compression. It minimizes storage of redundant data on the page by storing commonly occurring byte patterns on the page once and then referencing these values for respective columns. The byte pattern recognition is type-independent. Under PAGE compression, SQL Server optimizes space on a page using two techniques.

The first technique is column prefix. In this case, the system looks for a common byte pattern as a prefix for all values of a specific column across rows on the page. This process is repeated for all the columns in the table or index. The column prefix values that are computed are stored as an anchor record on the page and the data or index rows refer to the anchor record for the common prefix, if available, for each column.

The second technique is page level dictionary. This dictionary stores common values across columns and rows and stores them in a dictionary. The columns are then modified to refer to the dictionary entry.

Compression comes with additional CPU cost. This overhead is paid when you query or execute DML operations on compressed data. The relative CPU overhead with ROW is less than for PAGE, but PAGE compression can provide better compression. Since there are many kinds of workloads and data patterns, SQL Server exposes compression granularity at a partition level. You can choose to compress the whole table or index or a subset of partitions. For example, in a DW workload, if CPU is the dominant cost in your workload but you want to save some disk space, you may want to enable PAGE compression on partitions that are not accessed frequently while not compressing the current partition(s) that are accessed and manipulated more frequently. This reduces the total CPU cost, at a small increase in disk space requirements. If I/O cost is dominant for your workload, or you need to reduce disk space costs, compressing all data using PAGE compression may be the best choice. Compression can give many-fold speedups if it causes your working set of frequently touched pages to be cached in the main memory buffer pool, when it does not otherwise fit in memory. Preliminary performance results on one large-scale internal DW query performance benchmark used to test SQL Server 2008 show a 58% disk savings, an average 15% reduction in query runtime, and an average 20% increase in CPU cost. Some queries speeded up by a factor of up to seven. Your results depend on your workload, database, and hardware.

The commands to compress data are exposed as options in CREATE/ALTER DDL statements and support both ONLINE and OFFLINE mode. Additionally, a stored procedure is provided to help you estimate the space savings prior to actual compression.

Backup Compression

Backup compression helps you to save in multiple ways.

By reducing the size of your SQL backups, you save significantly on disk media for your SQL backups. While all compression results depend on the nature of the data being compressed, results of 50% are not uncommon, and greater compression is possible. This enables you to use less storage for keeping your backups online, or to keep more cycles of backups online using the same storage.

Backup compression also saves you time. Traditional SQL backups are almost entirely limited by I/O performance. By reducing the I/O load of the backup process, we actually speed up both backups and restores.

Of course, nothing is entirely free, and this reduction in space and time come at the expense of using CPU cycles. The good news here is that the savings in I/O time offsets the increased use of CPU time, and you can control how much CPU is used by your backups at the expense of the rest of your workload by taking advantage of the Resource Governor.

Resource Governor

The new Resource Governor in SQL Server 2008 enables you to control the amount of CPU and memory resources allocated to different parts of your relational database workload. It can be used to prevent runaway queries (that deny resources to others) and to reserve resources for important parts of your workload. SQL Server 2005 resource allocation policies treat all workloads equally, and allocate shared resources (for example, CPU bandwidth, and memory) as they are requested. This sometimes causes a disproportionate distribution of resources, which in turn results in uneven performance or unexpected slowdowns.

The primary goals of the Resource Governor are as follows:

1.    Monitoring: Enable monitoring of resource consumption per group of requests (workload group).

2.    Predictability: Provide predictable execution of workloads in an environment where there is resource contention. This is achieved by explicitly specifying resource boundaries between workloads (via resource pool controls). The implementation of resource boundaries also prevents, or reduces the probability of run-away queries. The monitoring capability provided with the Resource Governor facilitates easier detection of run-away queries.

3.    Prioritization: Enable the prioritization of workloads.

There are three new concepts which are important to understanding of the resource governor: workload groups, resource pools, classification (and classifier user-defined functions).

·         Group: A workload group, or group, is a user-specified category of requests that are similar according to the classification rules that are applied to each request. The value of a group is in the aggregate monitoring of resource consumption and a uniform policy that is applied to all the requests in a group. A group defines the policies for its members.

·         Pool: A resource pool, or pool, represents a portion of the physical resources of the server. Depending on its settings, a pool may have a fixed size (its minimum and maximum resource usage settings are equal to each other) or have a part which is shared between multiple pools (its minimum is less than its effective maximum). “Shared” in this case simply means that resources go to the pool that requests the resources first. In the default configuration all resources are shared, thus maintaining backward compatibility with SQL Server 2005 policies.

·         **Classification:**Classification is a set of user-written rules that enable Resource Governor to classify requests into the groups described previously. It is implemented through a scalar Transact-SQL user-defined function (UDF) which is designated as a “classifier UDF” for the Resource Governor.

These concepts are illustrated in the following figure.


Figure 5: Resource Governor example: requests, classification, groups, and pools

The Resource Governor may be used without any application changes.

Integration Services Improvements

Doing ETL to move data from your operational systems into your data warehouse can be a time-intensive task. To make this process faster, SQL Server 2008 Integration Services (SSIS) introduces two important scalability features: improved lookup performance and improved transformation pipeline performance.

Lookup Performance

The Lookup component in SSIS runs faster, and is even easier to program than in SQL Server 2005. A lookup tests whether each row in a stream of rows has a matching row in another dataset. A lookup is like a database join operation. Typically you use lookup within an integration process, such as the ETL layer that populates a data warehouse from source systems.

A lookup builds a cache of retrieved rows pulled from the dataset being probed. In SQL Server 2005, the Lookup component could only get data from specific OleDb connections, and the cache could be populated only by using a SQL query. In SQL Server 2008, the new version of Lookup allows you to populate the cache using a separate pipeline in the same package or a different package. You can use source data from just about anywhere.

SQL Server 2005 reloads the cache every time it is used. For example, if you have two pipelines in the same package that each require the same reference dataset, each Lookup component would cache its own copy. In SQL Server 2008, you can save the cache to virtual memory or permanent file storage. This means that within the same package, multiple Lookup components can share the same cache. You can save the cache to a file and share it with other packages. The cache file format is optimized for speed, and access to it can be orders of magnitude faster than reloading the reference dataset from the original relational source.

In SQL Server 2008, the Lookup component introduces the miss-cache feature. When the component is configured to perform lookups directly against the database, the miss-cache feature saves time by optionally loading into cache the key values that have no matching entries in the reference dataset. For example, if the component receives the value 123 in the incoming pipeline, but the Lookup component already knows that there are no matching entries in the reference dataset, the component will not try again to find 123 in the reference dataset. This reduces a redundant and expensive trip to the database. The miss-cache feature alone can contribute up to a 40% performance improvement in some scenarios.

Other enhancements to the Lookup component include:

·         Optimized I/O routines leading to faster cache loading and lookup operations.

·         More intuitive user interface that simplifies the configuration of the Lookup component, in particular the caching options.

·         Rows in the input that do not match at least one entry in the reference dataset are now sent to the No Match output. The Error output only handles errors such as truncations.

·         Query statements in lookup transformations can be changed at runtime, making programming transformations more flexible.

·         Informational and error messages are improved to help with troubleshooting and performance analysis.

The following figure illustrates a scenario that uses the new Lookup.


Figure 6: Lookup Scenario

Dataflow 1 populates a Cache Connection Manager (CCM) from a custom source, and then Dataflow 2 uses the same CCM to populate the lookup's cache. The figure also shows use of 3 outputs from the Lookup component.

Pipeline Performance

In SQL Server 2008 SSIS, several threads can work together to do the work that a single thread is forced to do by itself in SQL Server 2005 SSIS. This can give you a several-fold speedup in ETL performance.

In SQL Server 2005 SSIS, pipeline parallelism is more coarse-grained. When users have a simple package with one or two execution trees, there are only one or two processors used, and the package might not benefit from a multiprocessor machine with more than a few processors. Even if users logically split the data flow by using multicast, all output paths of a multicast belong to the same execution tree, and they are executed serially by the SQL Server 2005 SSIS data flow task.

To achieve a high level of parallelism, pipelines in SQL Server 2008 SSIS allow more parallel processing, which means that for any multiprocessor machine this should result in faster performance.

By using a shared thread pool, multiple outputs of a multicast can be executed simultaneously. In short, the multicast gives an ability to have an active buffer on each output and not just have one buffer (and one active thread), which is passed to each output. You do not need to use the “Union All” trick as a workaround to introduce more parallelism.

For example, suppose you have a flow that contains a multicast with four outputs. Each output flows into an aggregate. In SQL Server 2005 SSIS, only one of the aggregates is processed at a time. In SQL Server 2008 SSIS, all four aggregates can be processed in parallel.

The following figure shows how the enhanced SQL Server 2008 pipeline parallelism works.


Figure 7: Improved pipeline parallelism in Integration Services

Analysis Services Improvements

SQL Server 2008 Analysis Services (SSAS) dramatically improves query speed with the new block computation, writeback, and scalable shared database performance features. Manageability also improves with the ability to back up much larger databases.

MDX Query Performance: Block Computation

Improved block computation in SQL Server 2008 SSAS speeds up MDX query processing primarily by doing work just for the non-null values in a cube space. No time is wasted evaluating null cells. The key idea behind subspace computation is best introduced by contrasting it with a “naïve” cell-by-cell evaluation of a calculation. Consider a calculation RollingSum that sums the sales for the previous year and the current year, and a query that requests the RollingSum for 2005 for all products.

      RollingSum = (Year.PrevMember, Sales) + Sales

      SELECT 2005 on columns, Product.Members on rows WHERE RollingSum

A cell-by-cell evaluation of this calculation proceeds as shown in the following figure. 


Figure 8: Cell-by-cell evaluation example

The 10 cells for [2005, all products] are evaluated in turn. For each, we navigate to the previous year, obtain the sales value, and add it to the sales for the current year. There are two significant performance issues with this approach.

First, if the data is sparse, cells are calculated even though they are bound to return a null value. In the example, calculating the cells for anything but Product3 and Product6 is a waste of effort. The impact of this can be extreme—in a sparsely populated cube, the difference can be several orders of magnitude in the number of cells evaluated.

Second, even if the data is totally dense, meaning that every cell has a value and there is no wasted effort visiting empty cells, there is much repeated effort. The same work (such as getting the previous Year member, setting up the new context for the previous Year cell, checking for recursion) is re-done for each Product. It would be much more efficient to move this work out of the inner loop of evaluating each cell.

Now consider the same example performed using a subspace computation approach. First, we work our way down an execution tree, determining which spaces need to be filled. Given the query, we need to compute the space:

[Product.*, 2005, RollingSum]

Given the calculation, this means we must first compute the space:

[Product.*, 2004, Sales]

followed by the space:

[Product.*, 2005, Sales]

and then apply the ‘+’ operator to those two spaces.

Sales is a base measure, so we simply obtain the storage engine data to fill the two spaces at the leaves, and then work up the tree, applying the operator to fill the space at the root. Hence the one row (Product3, 2004, 3) and the two rows { (Product3, 2005, 20), (Product6, 2005, 5)} are retrieved, and the + operator applied to them to yield the result.


Figure 9: Block computation example that avoids doing work for NULL cells

The + operator operates on spaces, not simply scalarvalues. It is responsible for combining the two given spaces to produce a space that contains each product that appears in either space, with the summed value.   

We only operate on data that could contribute to the result. There is no notion of the complete space over which we must perform the calculation

Query and Writeback Performance

The performance of writeback operations, and queries on writeback data, is improved in SQL Server 2008 Analysis Services. Cell writeback in Analysis Services is the ability for end users to update cell values at the leaf or aggregate levels. Cell writeback uses a special writeback partition per measure group which stores the difference (delta) between the updated cell value and the original value. When an MDX query requests cell data from this measure group, the storage engine accesses all partitions including the writeback partition and aggregates the results to produce the right cell value.

In SQL Server 2005 and earlier releases, Analysis Services required writeback partitions to have ROLAP storage. This is a common cause for performance issues in cell writeback since ROLAP partitions query the relational data source on demand to retrieve their data. In SQL Server 2008, we allow writeback partitions with MOLAP storage. Retrieving writeback data from the compressed MOLAP format is much faster than querying the relational data source. Hence, MOLAP writeback partitions have better query performance than ROLAP. The extent of the performance improvement varies and depends on a number of factors including the volume of writeback data and the nature of the query.

MOLAP writeback partitions should also improve cell writeback performance since the server internally sends queries to compute the writeback deltas and these queries probably access the writeback partition. Note that the writeback transaction commit can be a bit slower since the server must update the MOLAP partition data in addition to the writeback table, but this should be insignificant compared with the other performance gains.

Analysis Services Enhanced Backup

In SQL Server 2008 Analysis Services one of the performance enhancements you’ll find is a new backup storage subsystem. The backup storage subsystem now has been rewritten to allow for greater performance and scalability. The changes are transparent to your application – no code change is necessary to take advantage of them.

The new backup storage subsystem had introduced a new format to the Analysis Services backup files. The file name extension hasn’t changed. However, the internal format is different, so backup can scale well to handle databases over a terabyte in size.

SQL Server 2008 Analysis Services backup is fully backward compatible with SQL Server 2005 Analysis Services. It allows you to restore databases backed up in SQL Server 2005 Analysis Services. SQL Server 2008 Analysis Services does not have ability to store backups in the old format for use in SQL Server 2005 Analysis Services.

The new highly performing backup storage subsystem allows customer to implement new backup scenarios. Where previously you had to rely on the raw file system copy utilities to back up large databases, now you have the option to use the built-in backup subsystem that is integrated with the transactional system, and allows running backup in parallel to other operations.

Scalable Shared Database for AS

Now you can scale out your OLAP query workload across many small servers with just a single copy of the database. SQL Server 2008 Analysis Services supports this with a feature called scalable shared database (SSD).

Scale out can apply to various scenarios and workloads such as processing, queries, data and cache management. The most common scale-out scenario for Analysis Services is to spread the query load across multiple servers in response to increasing number of concurrent users. This has been achieved in the past by placing a load balancing solution such as Microsoft Network Load Balancing (NLB) capability in front of multiple servers and replicating the physical data between the servers. Managing such an environment poses various challenges and the data replication is a major one. The scalable shared database feature enables DBAs to mark a database as read only and share it across multiple server instances from a Storage Area Network (SAN), thereby eliminating the need to replicate the data. This saves disk space, and the time that otherwise would have been spent copying data.

The following figure illustrates a typical SSD configuration.


Figure 10: Scalable shared database for Analysis Services

An alternate solution for improving performance is scale up, where a single large server is used instead of multiple small servers. The advantage of scale up is that a single query can typically be processed faster on a larger machine. But using scale out via SSD can save you money on hardware (given the lower cost per processor) and still satisfy your needs for many multi-user workloads. Furthermore, SSD allows you to scale to more processors than can be accommodated in a single large server.

The scalable shared database feature consists of three logical parts:

·         Read-only database: Allows marking a database as read-only

·         Database storage location: Allows a database to reside outside the server Data folder

·         Attach/detach database: Allows attaching or detaching a database from any UNC path

These features together enable the query scale-out scenario. However, each feature is independent and has usages outside of query scale out as well.

The SSD for AS feature works in a similar way as the SSD feature introduced in the SQL Server 2005 relational database.

Reporting Services Improvements

SQL Server 2008 Reporting Services (SSRS) provides performance, scale, and design improvements that make it a great choice for your enterprise reporting needs. We highlight the two major scalability improvements here.

Reporting Scalability

The SQL Server 2008 Reporting Services reporting engine has had a major upgrade from the prior release, so that it can render much larger reports than it could before. Although this is not specifically a data warehousing improvement (it is useful in operational reporting too), it is useful in some data warehousing scenarios. If you create reports with hundreds or thousands of pages, SQL Server 2008 Reporting Services helps you to render the reports faster. Moreover, the size of the largest report that can be rendered has been increased dramatically, given the same hardware configuration.

Server Scalability

SQL Server 2008 Reporting Services does not run inside Internet Information Server (IIS). It can manage its own memory, and has its own memory limits. This allows you to configure the memory settings so SSRS can run on the same computer more effectively with other services, such as SQL Server.


SQL Server gives you everything you need for data warehousing. With the 2008 release, it scales to meet the needs of the largest enterprises more readily than ever. As illustrated by the many data warehouse scale enhancements introduced in this paper, it is a major advance over previous releases. The number one change you’ll see is improved scalability across the board, for data warehouse construction, relational query processing, reporting, and analysis.


For more information:

·         SQL Server Web site

·         SQL Server TechCenter

·         SQL Server Developer Center


Bloom filter, Wikipedia 2007,

Hanson, Eric.,Improving Performance with SQL Server 2005 Indexed Views,