Consuming the results

Article
04/07/2016

From: Developing big data solutions on Microsoft Azure HDInsight

Now that the solution has generated some useful information, the team can decide how to consume the results for analysis or reporting. Storing the results of the Pig data processing jobs generated by the WordCount.pig script in a Hive table makes it easy for analysts to consume the data from Excel.

To enable access to Hive from Excel, the ODBC driver for Hive has been installed on all of the analysts workstations and a data source name (DSN) has been created for the HDInsight cluster. The analysts can use the Data Connection Wizard in Excel to connect to HDInsight using the DSN, and import data from the TopWords table, as shown in Figure 1.

Figure 1 - Using the Data Connection Wizard to access a Hive table from Excel

Note

For details of how to consume the output from HDInsight jobs in Excel, see Built-in data connectivity in the topic Consuming and visualizing data from HDInsight.

After the data has been imported into a worksheet, the analysts can use the full capabilities of Excel to explore and visualize it, as shown in Figure 2.

Figure 2 - Visualizing results in Excel

After the data has been extracted into a visualization tool, and you want to rerun the process with additional data, you will need to refresh the results. This typically means rerunning the scripts that perform the analysis in HDInsight and then refreshing the view of the results. For details of how this can be done, depending on the tools you are using, see the section “Scheduling data refresh in consumers” in Scheduling solution and task execution.

Next Topic | Previous Topic | Home | Community

Share via

Consuming the results

Additional resources