Corporate data model integration
From: Developing big data solutions on Microsoft Azure HDInsight
Senior executives of the e-commerce division at Adventure Works want to expose the newly discovered information from the web server log files to a wider audience, and use it in BI-focused business processes. Specifically, they want to compare the ratio of web page hits to sales as a core metric that can be included as a key performance indicator (KPI) in a business scorecard. The goal is to achieve a hit to sale ratio of around 6.5% (in other words, between six and seven items are sold for every hundred web page hits).
Scorecards and dashboards for Adventure Works are currently based on the SSAS corporate data model, which is also used to support formal reports and analytical business processes. The corporate data model is implemented as an SSAS database in tabular mode, so the process to add a table for the IIS log data is similar to the one used to import the results of a HiveQL query into a PowerPivot model. A BI developer uses SQL Server Data Tools to add an ODBC connection to the HDInsight cluster and create a new table named Page Hits based on the following query.
SELECT logdate, SUM(sc_bytes) sc_bytes, SUM(cs_bytes) cs_bytes, COUNT(*) pagehits
FROM iis_log GROUP BY logdate
Notice that this query includes more columns than the one previously used in the personal data model, making it useful for more kinds of analysis by a wider audience.
Note
The fact that Adventure Works is using SSAS in tabular mode makes it possible to connect to an ODBC source such as Hive. If SSAS had been installed in multidimensional mode, the developer would have had to either extract the data from HDInsight into an OLE DB compliant data source, or base the data model on a linked SQL Server database that has a remote server connection over ODBC to the Hive tables.
After the Page Hits table has been created and the data imported into the model, the data type of the logdate column is changed to Date and a relationship is created with the Date table in the same way as in the PowerPivot data model discussed in Report level integration. However, one significant difference between PowerPivot models and SSAS tabular models is that SSAS does not create implicit aggregated measures from numeric columns in the same way as PowerPivot does.
The Page Hits table contains a row for each date, with the total bytes sent and received, and the total number of page hits for that date. The BI developer created explicit measures to aggregate the sc_bytes, cs_bytes, and pagehits values across multiple dates based on data analysis expression (DAX) formulae, as shown in Figure 1.
Figure 1 - Creating measures in an SSAS tabular data model
These measures include DAX expressions that calculate the sum of page hits as a measure named Hits, the sum of sc_bytes as a measure named Bytes Sent, and the sum of cs_bytes as a measure named Bytes Received. Additionally, to support the requirement to track page hits to sales performance against a target of 6.5%, the following measures have been added to the Internet Sales table.
Actual Units:=SUM([Order Quantity])
Target Units:=([Hits]*0.065)
A KPI is then defined on the Actual Units measure, as shown in Figure 2.
Figure 2 - Defining a KPI in a tabular data model
When the changes to the data model are complete, it is deployed to an SSAS server and fully processed with the latest data. It can then be used as a data source for reports and analytical tools. For example, Figure 3 shows that the dashboard report used to summarize business performance has been modified to include a scorecard for the hits to sales ratio KPI that was added to the data model.
Figure 3 - A report showing a KPI from a corporate data model
Note
Dashboards are a great way to present high-level views of information, and are often appreciated most by business managers because they provide an easy way to keep track of performance of multiple sectors across the organization. By including the ability to drill down into the information, you make the dashboard even more valuable to these types of users.
In addition, because the data from HDInsight has now been integrated at the corporate data model level, it is more readily available to a wider range of users who may not have the necessary skills and experience to create their own data models or reports from an HDInsight source. For example, Figure 4 shows how a network administrator in the IT department can use the corporate data model as a source for a PivotChart in Excel that shows data transfer information for the e-commerce site.
Figure 4 - Using a corporate data model in Excel