Business Intelligence: Building Your First Cube
Stacia Misner
I like to introduce people to Microsoft SQL Server Analysis Services by creating a simple cube that contains a sampling of their own data. I have to be careful when I do this, though, because I have been pushed out of my chair on more than one occasion when new users got excited about what they were seeing and wanted to dive right in to explore the data in ways they never before thought possible. Whether you have data of your own you want to explore more easily or a data-hungry user community to support, you'll find SQL Server Analysis Services (SSAS) provides the tools.
This article is the third installment of a series; the first two articles appeared in the August 2009 issue and are available at technet.microsoft.com/magazine/ee263919.aspx. This series will help you develop a better understanding of business intelligence (BI) by using the components available in Microsoft SQL Server 2008 as you build a simple BI solution. Because I'll be referring to concepts and terminology introduced in the previous two articles, I recommend you take the time to review those articles first if you haven't read them already.
In "Planning Your First BI Solution," I explained the advantages of building a data mart compared with retrieving data directly from a corporate database. I also provided instructions for downloading the sample databases you'll need in order to follow the steps described in these articles. Then Derek Comingore, in "Building a Data Foundation for a BI Solution," explained how to use Integration Services to populate a data mart with data from your corporate database. In this article, I'll show you how to build a cube based on a similar data mart design.
Why Should You Build a Cube?
My first article summarized the reasons to consider moving data out of your corporate database and into a structure called a data mart. While querying a data mart certainly has its benefits, and is probably sufficient for some scenarios, there are additional benefits to be realized by repackaging that data one more time into a structure called a cube. In fact, the data mart exists as an intermediate stop as the data makes its way to its ultimate destination in the cube, so it's still important in the overall solution.
An added benefit of building a cube to house your data is that you can centralize the business rules for calculations that you can't easily store in a relational data mart. Furthermore, the structure of the cube makes it much easier to write queries to compare data year over year, or to create cumulative values such as year-to-date sales.
Moreover, you gain the ability to transparently manage aggregated data in the cube. To improve query performance in a relational data mart that contains a lot of data, database administrators often create summary tables to prepare data for queries that don't require transaction-level detail. SSAS creates the logical equivalent of summary tables (called aggregations) and keeps them up-to-date.
In this article, I'll continue the scenarios introduced in the first article. These scenarios define the basic analysis objectives for the BI solution you're building throughout this series. The solution must be able to show which sales channel is more profitable for Adventure Works, Internet Sales or Reseller Sales, and whether sales trends indicate that demand is growing or declining for certain products. Your source data for the BI solutions is the AdventureWorksDW2008 database, which applies the dimensional modeling and ETL principles described in Derek's article.
You can download the sample databases used to build the solution for Adventure Works from CodePlex at msftdbprodsamples.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=18407. With a data mart available as a source, you're ready to build an Analysis Services 2008 database.
Preparing Your Analysis Services Database
Your first step is to create a project in Business Intelligence Development Studio (BIDS). Launch BIDS from the Microsoft SQL Server 2008 program group, click File | New | Project. In the New Project dialog box, select Analysis Services Project. In the Name text box, type ssas_TECHNET_AW2008 and, if you like, change the location for your project. Click OK to create the project.
Now add a data source to define the connection string for your data mart. In Solution Explorer, right-click the Data Sources folder and click New Data Source. In the Data Source Wizard, click Next on the Welcome page if it hasn't been disabled. On the Select How to Define the Connection page, click New to set up a new connection. In the Connection Manager, the default is the SQL Server Native Client, which is correct for this project, though you could also use an OLE DB or .NET provider to access other types of data.
To define the connection, type the name of your server in the Server Name text box or select it from the drop-down list, then select AdventureWorksDW2008 in the database drop-down list and click OK. When you're back in the Data Source Wizard, click Next. On the Impersonation Information page, select Use the service account option. The service account is used to read data from the source when loading data into your SSAS database and must have read permissions to do so. Click Next and then Finish to complete the wizard.
Creating a Data Source View
Next, you create a data source view (DSV) as an abstraction of the tables (or views) from the data source you want to use to define dimensions and cubes. You can make changes to the DSV without modifying the underlying data source, which is handy if you have only read permissions to the data mart and can't make changes at the source. In Solution Explorer, right-click the Data Source Views folder and then click New Data Source View.
Click Next on the Welcome page if necessary. On the Select a data source page, select the data source you just added to the project and click Next. Add objects to the DSV by double-clicking each table or view you need. To create a DSV that helps you answer the BI questions posed at the beginning of this article, add the following tables to the DSV: DimDate, DimProduct, DimProductCategory, DimProductSubcategory, FactInternetSales and FactResellerSales. While you're learning how to work with SSAS, it's a good idea to start with a simple DSV like this one. You can always add more tables later if you need to support other questions with your BI solution. Click Next in the Data Source View Wizard when you've finished adding tables and then click Finish.
I recommend you simplify the names of your tables by selecting each one in the Tables pane of the DSV designer and removing the Dim and Fact prefixes from the FriendlyName property of each table. The wizards you use to create dimensions and cubes will then use the value in the FriendlyName property to assign names to objects. The complete DSV is shown in Figure 1.
Figure 1: Data Source View
In addition to changing a table's properties in the DSV, you can define logical primary keys or logical relationships between tables if the physical source does not have them defined already. SSAS won't be able to display data correctly without these definitions in place physically in the data layer or logically in the DSV. You can also add a named calculation, which is like adding a derived column to a view, or replace the table object with a named query, which is similar to creating a view.
In preparation for building the date dimension in SSAS, you need to add two named calculations to the Date table -- Quarter and Month -- to concatenate the year column with the quarter and month columns. Otherwise, data will not roll up correctly by month, quarter and year when you want to view summarized data. To add the named calculation for Quarter, right-click on the Date table in either the designer or the Tables pane and click New Named Calculation. In the Column Name text box, type Quarter. In the Expression text box, type the following expression:
'Qtr ' + convert(char(1), CalendarQuarter) + ' ' + convert(char(4), CalendarYear)
Click OK and now repeat these steps to add the named calculation for Month, using the following expression:
left(EnglishMonthName, 3) + ' ' + convert(char(4), CalendarYear)
Notice that the Date table in the designer (shown in Figure 2) displays your named calculations with a calculator icon to differentiate them from columns physically in the table. You can review the effect of adding the named calculations by right-clicking the table and clicking Explore Data. A new window opens in BIDS to display the data from the source table.
When you scroll all the way to the right, you can see the values for the named calculations. You should always use the Explore Data command to confirm that the named calculations display the way you intend before you proceed to building your dimensions.
Defining Dimensions in Analysis Services
After adding dimension tables to the DSV, you're ready to create dimensions in the SSAS database. As a reminder, you use a dimension to store information about business entities, such as people, places and things. To be able to answer the BI questions posed at the beginning of this article, you need to create dimensions for Date and Product.
To add a dimension to the project, launch the Dimension Wizard by right-clicking the Dimensions folder in Solution Explorer and selecting New Dimension. On the Select Creation Method page, keep the default selection to use an existing table because your DSV includes the DimDate table. For those situations when you might build a simple database based on a small transactional table that includes dates, you might not build out a data mart and can instead use one of the other options on this page to generate a time table in the data source or on the server.
Figure 2 Date Table with Named Calculations
In either case, you supply a beginning date and an end date for the data in the transactional table. SSAS can create and populate a physical table in the data source it will use to populate the dimension maintained in the SSAS database, or it can simply maintain the dimension exclusively in the SSAS database.
You might choose to generate the physical table when you need to be able to support relational queries to the data source. Click Next to continue.
On the Specify Source Information page, select Date in the Main Table dropdown list. Note the Key Column is DateKey. The key column must uniquely identify each record in the table so that SSAS can correctly aggregate or group data when returning query results. Then, in the Name Column dropdown list, select FullDateAlternateKey. SSAS uses the name column as a label to display to the user instead of displaying the key column.
Click Next.
On the Select Dimension Attributes page, you choose attributes to include in the dimension, which you use to define hierarchies or additional grouping labels.
Whether you include all attributes from the dimension table depends on the types of questions you want to support in your BI solution. Add only as many as you need to ensure SSAS performs optimally, to avoid consuming storage space unnecessarily and to avoid confusing users with too many options. For this dimension, select Calendar Year, Quarter and Month. Click Next and then click Finish.
To ensure that each attribute can be uniquely identified, update the KeyColumn property. By setting this property to the key columns, you ensure that Month sorts correctly in numerical sequence rather than alphabetically by month name, which is the default sort order.
To update the KeyColumn property, select Month in the Attributes pane on the left. In the Properties window, click the KeyColumns property and then the ellipsis button. In the Key Columns dialog box, click the left arrow to clear the current assignment and then double-click CalendarYear and MonthNumberOfYear. Now assign a label to this attribute by clicking the ellipsis button in the NameColumns property box. Select EnglishMonthName, and click OK.
Repeat these steps to set the KeyColumns property of the Quarter attribute to CalendarYear and CalendarQuarter and to set the NameColumns property to Quarter.
The dimension now appears in Solution Explorer as Date.dim and the dimension designer (shown in Figure 3) displays in the main workspace area in BIDS.
Figure 3 Date Dimension Shown in the Designer
You can see the dimension with its associated attributes, including the key attribute, in the Attributes pane on the left. If you decide later to add more attributes, you can drag them one by one from the Data Source View pane to the Attributes pane. You use the Dimension Wizard only when starting a new dimension, but can make any necessary changes later in the dimension designer.
Now use the Dimension Wizard to create the Product dimension, using Product as the main table and EnglishProductName as the name column. Because the Product dimension is a snowflake schema, the wizard includes an additional page to confirm that you want to include the related tables, ProductSubcategory and ProductCategory.
Next, add the Color and Size attributes to the dimension. Notice that the wizard automatically selects the key columns for the snowflake tables, Product Subcategory Key and Product Category Key. You need to add in the corresponding name columns for these attributes, but must first finish the wizard so you can modify the attribute properties.
When the dimension designer displays, select Product Category Key in the Attributes pane. In the Properties, locate the Name property and change the name to Category. Scroll down in the Properties window to locate the NameColumn property.
Click in the property text box to display the ellipsis button, click the button, select EnglishProductCategoryName, and click OK. Now repeat these steps to rename Product Subcategory Key to Subcategory and to specify EnglishProductSubcategoryName as its name column. Last, rename the Product Key attribute as Product.
Adding Hierarchies
Notice the blue wavy line that appears under the dimension name, Product, in the Attributes pane. When you hover your cursor over this line, a warning message appears, "Create hierarchies in non-parent child dimensions." This message is an example of the best practices warnings built into SSAS 2008 to help you build an SSAS database correctly. Return to the Date dimension designer, either by clicking the applicable tab in the document workspace or by double-clicking Date.dim in Solution Explorer, and you'll see the same warning there.
Adding a hierarchy to a dimension is considered a best practice for several reasons, in particular for user-friendliness and optimization purposes. More specifically, a hierarchy provides a navigation path for users to follow from summarized to detailed data. Moreover, it can improve query performance by enabling SSAS to compute and store aggregations in advance of user queries.
For example, if the user wants to see reseller sales by year and a hierarchy in the Date dimension defines the rollup path from the key attribute (Date Key) to the Calendar Year attribute, SSAS can calculate the sales data by year during processing and then put the results into permanent storage. This storage of aggregations avoids the need to compute each year's sales at query time and is the key difference between retrieving data from a relational data mart or from a multidimensional database like SSAS.
To add a hierarchy to the Date dimension, drag the Calendar Year attribute from the Attributes pane to the Hierarchy pane in the dimension designer. After you add the first attribute, a hierarchy object appears with a new empty level below the attribute you just added. Add the Quarter and Month attributes to the hierarchy by dropping each one into the space for the empty level. Next, rename the hierarchy by right-clicking Hierarchy, then clicking Rename and typing Year.
Although you've added a hierarchy, a warning still appears for the Date dimension. Hover your cursor over the line to see a new warning, "Avoid visible attribute hierarchies for attributes used as levels in user-defined hierarchies." This message suggests you prevent users from viewing an attribute by itself when that attribute has been included in a hierarchy. In other words, a user should only be able view the attribute when navigating within the hierarchy.
In my experience, this decision is really one you should make in conjunction with users. If you decide to heed this warning, select Quarter in the Attributes pane and then, in the Properties window, change the AttributeHierarchyVisible value to False.
Attribute Relationships
Another warning appears on the hierarchy itself. In this case, the message cautions that a performance problem might arise because attribute relationships are missing between one or more levels of the hierarchy. Attribute relationships are used by SSAS to optimize query performance and aggregation design, to reduce the amount of storage required for a dimension, and to speed up database processing time.
Click the Attribute Relationships tab in the dimension designer. (Note that this tab is available only if you're using Analysis Services 2008.) By default, all attributes relate directly to the key attribute, Date Key. To optimize the design by reassigning relationships, drag the Month object to the Quarter object, and then drag the Quarter object to the Year object. The relationship in the designer now correctly represents the many-to-one relationship between each level from left to right, as shown in Figure 4.
Figure 4 Attribute Relationships
Next, add a hierarchy to the Product dimension called Categories that contains Category, Subcategory and Product from top to bottom. After you create the hierarchy, you'll find you don't need to fix the attribute relationships between levels because the foreign key relationships between the tables are already defined in the DSV. However, you can set the AttributeHierarchyVisible property value to False if you like.
Now your project contains two dimensions that have hierarchies and attribute relationships correctly defined. As you learn more about dimension design, you'll find there are several properties available for tuning performance and controlling specific behavior in the user interface.
Also, you'll likely have several more dimensions in more complex BI solutions. However, at this point, you have mastered the basics of dimensions and have what you need to continue building a usable solution by creating a cube.
Creating a Cube
Just as you use the Dimension Wizard to start the process of defining your dimensions, you use the Cube Wizard to begin creating your cube. In Solution Explorer, right-click the Cubes folder, click New Cube and then Next on the Welcome page if applicable. On the Select Creation Method page, keep the default option. (You'd use the other two options when you have a design you want to create manually or with a template, then have SSAS generate tables in the data source based on that design, which you'd need to populate with data using Integration Services before you could browse your cube.) Click Next.
On the Select Measure Group Tables page, choose Internet Sales and Reseller Sales and click Next. (Measure group tables are synonymous with fact tables.) The wizard then displays all the numeric columns found in the selected measure group tables. In keeping with the simple approach you've taken thus far, clear all the measures by clearing the Measure check box at the top of the page, and then select the following measures from each group (Internet Sales and Reseller Sales): Order Quantity, Total Product Cost and Sales Amount.
You can also rename measures on this page. Just right-click on the measure name and type the new name, but make sure each measure name is unique. The measure name should be short, but not so short that it's cryptic. Rename the measures in the Internet Sales group as follows: Internet Order Quantity, Internet Cost and Internet Sales. Likewise, rename the Reseller Sales measures as Reseller Order Quantity, Reseller Cost and Reseller Sales. Click Next.
On the Select Existing Dimensions page, the wizard displays the dimensions you've already created. Click Next. If there are any tables in the DSV that are not yet referenced as a measure group table or by an existing dimension, the Select New Dimensions page will display to allow you to quickly add any other dimension you might need. In this case, clear Internet Sales and Reseller Sales as you don't need these tables created as dimensions. (Although technically these are measure group tables, the presence of sales order numbers in the tables would allow you to create dimensions to support reporting or analysis of sales by sales order number.)
Click Next, rename the cube as Sales, and click Finish. Congratulations!
You've successfully built a simple cube! The cube designer displays the measure groups and the dimensions added to the cube in the panes on the left, and the source DSV on the right, as shown in Figure 5.
Figure 5 Cube Designer Showing Measure Groups and Dimensions
The first step to take after building a cube is setting the FormatString property of each measure to make it easier to view values in the cube browser. The quickest way to do this is to view the measures in a grid. In the toolbar of the cube designer, click the fifth button from the left to toggle from Tree View to Grid View. In Grid View, you can use the Ctrl key to select several measures at once. First, select Internet Order Quantity and Reseller Order Quantity. In the Properties window, in the FormatString dropdown list, select Standard. Then select all remaining measures to set the FormatString property to Currency.
Adding Calculations
A powerful feature of SSAS is the ability to add calculations using the Multidimensional Expression language (MDX). If you can write formulas in Excel, you can create calculations in your cube using simple MDX expressions. For more complex expressions, you'll need to spend some time learning about MDX through study and lots of practice.
Remember that one of the design goals of this solution is to measure profitability by sales channel. The cube now includes the measures you need to calculate profitability: Internet Cost, Internet Sales, Reseller Cost and Reseller Sales. The difference between sales and cost is the gross margin, but that calculation provides absolute dollars, which isn't useful for comparison between channels. In addition to gross margin, you need to calculate the gross margin percentage by dividing gross margin by the sales amount.
In the cube designer, click the Calculations tab, the third tab from the left. Then click the New Calculated Member button on the toolbar, the fifth button from the left. In the Name text box, type [Internet Gross Margin].
The brackets are required when the name includes spaces. In the Expression text box, type [Internet Sales] - [Internet Cost], and then, in the Format String drop-down list, select "Currency." Now repeat these steps to add the calculations shown in Figure 6.
Name | Expression | Format String |
[Reseller Gross Margin] | [Reseller Sales] - [Reseller Cost] | "Currency" |
[Internet Gross Margin Pct] | [Internet Gross Margin] / [Internet Sales] | "Percent" |
[Reseller Gross Margin Pct] | [Reseller Gross Margin] / [Reseller Sales] | "Percent" |
Figure 6 Adding Calculations to the Cube
Deploying an Analysis Services Database from BIDS
Thus far, although you have built the objects necessary to create an SSAS database on the server, the database itself doesn't yet exist. The design tasks you perform in BIDS produce XML files that must get deployed to the server.
Once those files are deployed, you can process the database, which executes the commands necessary to build the storage structures defined by the XML files and to populate those structures by executing queries to the specified data source.
Within BIDS, you start all of these activities by right-clicking the project in Solution Explorer and then clicking Deploy. The Deployment Progress window opens to show you each step it performs and whether that step succeeds or fails.
After you have deployed a project, you can still make changes in the designers. Just deploy the project again as previously described to move your changes to the server and launch processing. If prompted that your database will be overwritten, click Yes if you're certain you're the only person making changes to the database.
Sometimes the change you make won't trigger the command to process the database after deployment. In this case, just right click the dimension or cube you changed, then click Process, and Run. If you process a dimension using the Process Full option (which is required if you make a significant structural change to the dimension), you might also need to process the cube.
Browsing the Cube in BIDS
At each stage of development, as long as you have successfully deployed and processed the solution, you can review your progress from the user's point of view. In the cube designer, click the Browser tab. The pane on the left displays the objects in the SSAS database in a metadata tree (shown in Figure 7). The top node of the tree is the cube. Expand the Measures node and the folders it contains to view all of the measures available, and then expand the Order Date and Product nodes to view the attributes in these dimensions.
Figure 7 Sales Cube Metadata Tree
You might be wondering why the cube contains Due Date, Order Date and Ship Date when you created only one Date dimension. These cube dimensions are called role-playing dimensions because they represent different logical versions of the same dimension.
They appear automatically in the cube when you include the Date dimension because the fact table contains three different foreign key columns that all relate back to the single table on which the Date dimension is based to track the order, shipping and due events separately. (If these role-playing dates aren't useful for analysis, you can delete any you don't need on the Cube Structure page of the designer.)
To view cube data, you drag objects from the metadata tree to the grid in the center of the designer. Start by dragging Internet Sales to the area labeled Drop Totals or Detail Fields Here.
Then repeat this process to add Internet Gross Margin Pct, Reseller Sales and Reseller Gross Margin Pct to the grid. With the results of this simple query now visible in the cube browser (as shown in Figure 8), you can see that Internet Sales are significantly more profitable than Reseller Sales.
Figure 8 Query Results in the Cube Browser
You can continue exploring these results by dragging attributes to the sections of the grid for rows, columns or filters, which collectively are called axes, or by dragging attributes and measures off the grid. The process of adding objects to the axes to refine the query is called slice and dice by BI users and is a very fast way to query data without writing any code. For example, to slice by year, drag Order Date.Year to the section labeled Drop Row Fields Here.
Because Order Date.Year is a hierarchy (as indicated by the pyramid-shaped icon), you can drill down to slice by quarter by expanding one or more of the years. Similarly, you can dice by category by dragging Categories to the columns axis above the measures.
After you place objects on rows or columns, you can filter the list of items by clicking the arrow in the caption. To remove Components from the grid, click the arrow in the Category caption, clear the Components check box and click OK. Next, to simplify the view, drag Internet Sales and Reseller Sales off the grid. You can now easily compare the profitability of the Internet and Reseller channels by year and quarter and by product category, as shown in Figure 9.
Figure 9 Profitability of Sales Channels by Year and Quarter and by Category
Sharing the Cube with Users
By default, the cube is accessible only by the server administrator. However, once your cube is deployed to the Analysis Server, you can set permissions to grant users access to it, enabling them to use their favorite tool to browse the cube. Microsoft Excel 2007 is a popular choice for interactive browsing, but you can also use Reporting Services to distribute reports based on cube data, as I'll explain in a future article.
Applying Your New Skills
Now that you've built a small database using the sample AdventureWorks data, you should apply what you've learned to your own data by building a simple cube. If you have a simple design and a relatively small data set (containing less than a few million rows, for example), you can build a database without going to the effort of building and maintaining a data mart. Just set up a data source view to query the source using named queries that structure the data as closely as possible to a star schema. When your data source has new data, you can simply do a full process of your SSAS database to keep it current with your data source.
If you have more data to put into a cube, you should read more about SSAS in Books Online and other resources as this article explains only the very basic points you need to know to build a simple cube. For example, at the beginning of this article, I mentioned that one benefit of SSAS is aggregation management, but to explain more about that subject is beyond the scope of this article.
You can learn more about aggregations and other advanced topics by reading "Microsoft SQL Server 2008 Analysis Services Step by Step" (Microsoft Press, 2009), recently published by a good friend of mine, Scott Cameron, who started his business intelligence career with me many years ago. Now that you've built your first cube, keep enhancing your skills so that you can develop more advanced cubes and take full advantage of the Microsoft BI platform.
Stacia Misner, a BI consultant, educator and author, as well as the founder and principal of Data Inspirations. She has 25 years of experience in the IT industry, with nine years focused on the Microsoft BI stack. Misner has written several books about BI and SQL Server. Her latest book, "Microsoft SQL Server 2008 Reporting Services Step by Step" (Microsoft Press, 2009), was released earlier this year. She can be reached at smisner@datainspirations.com.
Additional Resources
Analysis Services Development (Books Online)
msdn.microsoft.com/library/bb500183.aspx
Analysis Services Security (Books Online)
msdn.microsoft.com/library/ms175386.aspx
Analysis Services 2008 Videos (Books Online)
msdn.microsoft.com/library/dd299422.aspx