Analysis Services Query Performance Top 10 Best Practices
Proper cube design, efficient multidimensional expressions (MDX), and sufficient hardware resources are critical to optimal performance of MDX queries issued against a SQL Server 2005 Analysis Services instance. This article lists the ten most common best practices that the Microsoft SQL Server development team recommends with respect to optimizing Analysis Services query performance. For additional discussions about Analysis Services best practices related to query performance, see The Analysis Services Performance Guide and OLAP Design Best Practices for Analysis Services 2005.
Optimize cube and measure group design
Define cascading attribute relationships (for example Day > Month > Quarter > Year) and define user hierarchies of related attributes (called natural hierarchies) within each dimension as appropriate for your data. Attributes participating in natural hierarchies are materialized on disk in hierarchy stores and are automatically considered to be aggregation candidates. User hierarchies are not considered to be natural hierarchies unless the attributes comprising the levels are related through cascading attribute relationships. With SQL Server 2005 Service Pack 2 (SP2), a warning appears in Business Intelligence Development Studio with each user hierarchy that is not defined as a natural hierarchy.
Remove redundant relationships between attributes to assist the query execution engine in generating the appropriate query plan. Attributes need to have either a direct or an indirect relationship to the key attribute, not both.
Keep cube space as small as possible by only including measure groups that are needed.
Place measures that are queried together in the same measure group. A query that retrieves measures from multiple measure groups requires multiple storage engine operations. Consider placing large sets of measures that are not queried together into separate measure groups to optimize cache usage, but do not explode the number of measure groups.
Minimize the use of large parent-child hierarchies. In parent-child hierarchies, aggregations are created only for the key attribute and the top attribute (for example, the All attribute) unless it is disabled. As a result, queries returning cells at intermediate levels are calculated at query time and can be slow for large parent-child dimensions. If you are in a design scenario with a large parent-child hierarchy (more than 250,000 members), you may want to consider altering the source schema to reorganize part or all of the hierarchy into a user hierarchy with a fixed number of levels.
Optimize many-to-many dimension performance, if used. When you query the data measure group by the many-to-many dimension, a run-time “join” is performed between the data measure group and the intermediate measure group using the granularity attributes of each dimension that the measure groups have in common. Where possible, reduce the size of the intermediate fact table underlying the intermediate measure group. To optimize the run-time join, review the aggregation design for the intermediate measure group to verify that aggregations include attributes from the many-to-many dimension.
To understand how to optimize dimensions to increase query performance, refer to the articles SQL Server 2005 Analysis Services Performance Guide and OLAP Design Best Practices for Analysis Services 2005. For assistance in analyzing your design for compliance with best practices, see the February 2007 Community Technology Preview (CTP) release of the SQL Server 2005 Best Practices Analyzer (the final version should be released soon).
Define effective aggregations
Define aggregations to reduce the number of records that the storage engine needs to scan from disk to satisfy a query. If SQL Server Profiler traces indicate that most user queries that are not resolved from cache are resolved by partition reads rather than aggregation reads, consider using the Aggregation Manager sample application to design custom aggregations. This sample is available on CodePlex at https://www.codeplex.com/MSFTASProdSamples and a version of this sample that has been updated by the community is available on CodePlex at https://www.codeplex.com/bidshelper.
Avoid designing an excessive number of aggregations. Excessive aggregations reduce processing performance and may reduce query performance. While the optimum number of aggregations varies, the SQL Server Best Practices team’s experience has been that the optimum number is in the tens, not hundreds or thousands in almost all cases.
Enable the Analysis Services query log to capture user query patterns and use this query log when designing aggregations. For more information, see Configuring the Analysis Services Query Log.
To understand how to design aggregations to increase query performance, refer to the articles SQL Server 2005 Analysis Services Performance Guide and OLAP Design Best Practices for Analysis Services 2005.
Use partitions
Define partitions to enable Analysis Services to query less data to resolve a query when it cannot be resolved from the data cache or from aggregations. Also define the partitions to increase parallelism when resolving queries.
For optimum performance, partition data in a manner that matches common queries. A very common choice for partitions is to select an element of time such as day, month, quarter, year or some combination of time elements. Avoid partitioning in a way that requires most queries to be resolved from many partitions.
In most cases, partitions should contain fewer than 20 million records size and each measure group should contain fewer than 2,000 total partitions. Also, avoid defining partitions containing fewer than two million records. Too many partitions causes a slowdown in metadata operations, and too few partitions can result in missed opportunities for parallelism.
Define a separate ROLAP partition for real-time data and place real-time ROLAP partition in its own measure group.
To understand how to design partitions to increase query performance, refer to the SQL Server 2005 Analysis Services Performance Guide, the Microsoft SQL Server Customer Advisory Team blog, and OLAP Design Best Practices for Analysis Services 2005.
Write efficient MDX
Remove empty tuples from your result set to reduce the time spent by the query execution engine serializing the result set.
Avoid run-time checks in an MDX calculation that result in a slow execution path. If you use the Case and IF functions to perform condition checks which must be resolved many times during query resolution, you will have a slow execution path. Rewrite these queries using the SCOPE function to quickly reduce the calculation space to which an MDX calculation refers. For more information, see Budget Variance - A study of MDX optimizations: evaluation modes and NON_EMPTY_BEHAVIOR, Comparing Levels in MDX and CONDITION vs. SCOPE in cell calculations, and Multiselect friendly MDX calculations.
Use Non_Empty_Behavior where possible to enable the query execution engine to use bulk evaluation mode. However, if you use Non_Empty_Behavior incorrectly, you will return incorrect results. For more information, see Budget Variance - A study of MDX optimizations: evaluation modes and NON_EMPTY_BEHAVIOR and Averages, ratios, division by zero and NON_EMPTY_BEHAVIOR.
Use EXISTS rather than filtering on member properties to avoid a slow execution path. Use the NonEmpty and Exists functions to enable the query execution engine to use bulk evaluation mode.
Perform string manipulations within Analysis Services stored procedures using server-side ADOMD.NET rather than with string manipulation functions such as StrToMember and StrToSet.
Rather than using the LookupCube function, use multiple measure groups in the same cube wherever possible.
Rewrite MDX queries containing arbitrary shapes to reduce excessive subqueries where possible. An arbitrary shaped set is a set of members that cannot be resolved as a crossjoin of sets with a single hierarchality. For example, the set {(Gender.Male, Customer.USA), (Gender.Female, Customer.Canada)} is an arbitrary set. You can frequently use the Descendants function to resolve arbitrary shapes by using a smaller number of subqueries than queries that return the same result that are written using other functions.
Rewrite MDX queries that result in excessive prefetching where possible. Prefetching is a term used to describe cases where the query execution engine requests more information from the storage engine than is required to resolve the query at hand for reasons of perceived efficiency. Generally, prefetching is the most efficient data retrieval choice. However, occasionally it is not. In some cases you may be able to eliminate excessive prefetching by rewriting queries with a subselect in the FROM clause rather than a set in the WHERE clause. When you cannot eliminate excessive prefetching, you may need to disable prefetching and warm the cache using the Create Cache statement. For more information, see How to Warm up the Analysis Services data cache using Create Cache statement.
Filter a set before using it in a crossjoin to reduce the cube space before performing the crossjoin.
Use the query engine cache efficiently
Ensure that the Analysis Services computer has sufficient memory to store query results in memory for re-use in resolving subsequent queries. To monitor, use the MSAS 2005: Memory/Cleaner Memory Shrinkable DB and the MSAS 2005: Cache/Evictions/sec Performance Monitor counters.
Define calculations in the MDX script. Calculations in the MDX script have a global scope that enables the cache related to these queries to be shared across sessions for the same set of security permissions. However, calculated members defined using Create Member and With Member within user queries do not have global scope and the cache related to these queries cannot be shared across sessions.
Warm the cache by executing a set of predefined queries using the tool of your choice. You can also use a Create Cache statement for this purpose. For more information on using the Create Cache statement, see How to Warm up the Analysis Services data cache using Create Cache statement. For information on how to use SQL Server 2005 Integration Services to warm the cache, see Build Your Own Analysis Services Cache-Warmer in Integration Services.
Rewrite MDX queries containing arbitrary shapes to optimize caching. For example, in some cases you can rewrite queries that require non-cached disk access such that they can be resolved entirely from cache by using a subselect in the FROM clause rather than a WHERE clause. In other cases, a WHERE clause may be a better choice.
Ensure flexible aggregations are available to answer queries.
Note that incrementally updating a dimension using ProcessUpdate on a dimension drops all flexible aggregations affected by updates and deletes and, by default, does not re-create them until the next full process.
Ensure that aggregations are re-created by processing affected objects, configuring lazy processing, performing ProcessIndexes on affected partitions, or performing full processing on affected partitions.
To understand how to ensure flexible aggregations are not dropped, refer to the SQL Server 2005 Analysis Services Performance Guide.
Tune memory usage
Increase the size of the paging files on the Analysis Services server or add additional memory to prevent out–of-memory errors when the amount of virtual memory allocated exceeds the amount of physical memory on the Analysis Services server.
Use Microsoft Windows Advanced Server® or Datacenter Server with SQL Server 2005 Enterprise Edition (or SQL Server 2005 Developer Edition) when you are using SQL Server 2005 (32-bit) to enable Analysis Services to address up to 3 GB of memory. To enable Analysis Services to address more than 2 GB of physical memory with either of these editions, use the /3GB switch in the boot.ini file. If you set the /3GB switch in the boot.ini file, the server should have at least 4 GB of memory to ensure that the Windows operating system also has sufficient memory for system services.
Reduce the value for the Memory/LowMemoryLimit property below 75 percent when running multiple instances of Analysis Services or when running other applications on the same computer.
Reduce the value for the Memory/TotalMemoryLimit property below 80percent when running multiple instances of Analysis Services or when running other applications on the same computer.
Keep a gap between the Memory/LowMemoryLimit property and the Memory/TotalMemoryLimit property – 20 percent is frequently used.
When query thrashing is detected in a multi-user environment, contact Microsoft Support for assistance in modifying the MemoryHeapType.
When running on non-uniform memory access (NUMA) architecture and VirtualAlloc takes a very long time to return or appears to stop responding, upgrade to SQL Server 2005 SP2 and contact Microsoft Support for assistance with appropriate settings for pre-allocating NUMA memory.
To understand when to consider changing default memory use, refer to the SQL Server 2005 Analysis Services Performance Guide and Microsoft SQL Server Customer Advisory Team blog.
Tune processor usage
To increase parallelism during querying for servers with multiple processors, consider modifying the Threadpool\Query\MaxThreads and Threadpool\Process\MaxThreads options to be a number that depends on the number of server processors.
A general recommendation is to set the Threadpool\Query\MaxThreads to a value of less than or equal to two times the number of processors on the server. For example, if you have an eight-processor server, the general guideline is to set this value to no more than 16. In practical terms, increasing the Threadpool\Query\MaxThreads option will not significantly increase the performance of a given query. Rather, the benefit of increasing this property is that you can increase the number of queries that can be serviced concurrently.
A general recommendation is to set the Threadpool\Process\MaxThreads option to a value of less than or equal to 10 times the number of processors on the server. This property controls the number of threads used by the storage engine during querying operations as well as during processing operations. For example, if you have an eight-processor server, the general guideline is setting this value to no more than 80. Note that even though the default value is 64, if you have fewer than eight processors on a given server, you do not need to reduce the default value to throttle parallel operations.
While modifying the Threadpool\Process\MaxThreads and Threadpool\Query\MaxThreads properties can increase parallelism during querying, you must also consider the additional impact of the CoordinatorExecutionMode option. For example, if you have a four-processor server and you accept the default CoordinatorExecutionMode setting of -4, a total of 16 jobs can be executed at one time across all server operations. So if 10 queries are executed in parallel and require a total of 20 jobs, only 16 jobs can launch at a given time (assuming that no processing operations are being performed at that time). When the job threshold has been reached, subsequent jobs wait in a queue until a new job can be created. Therefore, if the number of jobs is the bottleneck to the operation, increasing the thread counts may not necessarily improve overall performance.
Scale up where possible
Use a 64-bit architecture for all large systems.
Add memory and processor resources and upgrade the disk I/O subsystem, to alleviate query performance bottlenecks on a single system.
Avoid linking dimensions or measure groups across servers and avoid remote partitions whenever possible because these solutions do not perform optimally.
Scale out when you can no longer scale up
If your performance bottleneck is processor utilization on a single system as a result of a multi-user query workload, you can increase query performance by using a cluster of Analysis Services servers to service query requests. Requests can be load balanced across two Analysis Services servers, or across a larger number of Analysis Services servers to support a large number of concurrent users (this is called a server farm). Load-balancing clusters generally scale linearly.
When using a cluster of Analysis Services servers to increase query performance, perform processing on a single processing server and then synchronize the processing and the query servers using the XMLA Synchronize statement, copy the database directory using Robocopy or some other file copy utility, or use the high-speed copy facility of SAN storage solutions.