Analyzing data from the data warehouse

patterns & practices Developer Center

From: Developing big data solutions on Microsoft Azure HDInsight

Having built and populated the data warehouse, the data analysts were ready to start analyzing the data in the data warehouse tables. In this example scenario, Excel is used to examine all of the data. However, as the volume of data increases it will be necessary to preselect parts of the data or an aggregated result by applying some processing within HDInsight to create a Hive table of manageable size.

Analyzing data in Excel

The data analysts typically use Excel for analysis, so the first task they undertook was to create a PowerPivot data model in an Excel workbook. The data model uses the Hive ODBC driver to import data from the data warehouse tables in the HDInsight cluster, and defines relationships and hierarchies that can be used to aggregate the data, as shown in Figure 1.

Figure 1 - A PowerPivot data model based on a Hive data warehouse

Figure 1 - A PowerPivot data model based on a Hive data warehouse

Note

For information about using PowerPivot see PowerPivot. For information about using the Hive ODBC driver see Built-in data connectivity.

The business analysts could then use the data model to explore the data by creating a PivotTable showing the total cost of property and crop damage by state and time period, as shown in Figure 2.

Figure 2 - Analyzing the data with a PivotTable

Figure 2 - Analyzing the data with a PivotTable

Note

For information about using a PivotTable see Visualizing data in Excel.

Analyzing data with Power Map

The data includes geographic locations as well as the date and time of each tornado, making it ideal for visualization using Power Map. Power Map enables analysts to create interactive tours that show summarized data on a map, and animate it based on a timeline.

The analysts created a Power Map tour that consist of a single scene with the following two layers:

  • Layer 1: Accumulating property and crop damage by state costs shown as a stacked column chart.
  • Layer 2: Average tornado category by latitude and longitude shown as a heat map.

The Power Map designer for the tour is shown in Figure 3.

Figure 3 - Designing a Power Map tour

Figure 3 - Designing a Power Map tour

Note

Power Map is a great tool for generating interactive and highly immersive presentations of data, especially where the data has a date or time component that allows you to generate a “changes over time” animation. For more information see Visualizing data in Excel.

The scene in the Power Map tour was then configured to animate movement across the map as the timeline progresses. While the tour runs, the property and crop losses are displayed as columns that accumulate each month, and the tornados are displayed as heat maps based on the average tornado category (defined in the Enhanced Fujita scale), as shown in Figure 4.

Figure 4 - Viewing a Power Map tour

Figure 4 - Viewing a Power Map tour

After the tour has finished the analysts can explore the final costs incurred in each state by zooming and moving around the map viewer.

Next Topic | Previous Topic | Home | Community