다음을 통해 공유


MVP Series: Integrate Self-Service BI in your Corporate BI strategy

For the month of October and November we are thrilled to have special guest authors from the Canadian MVP Award Program contributing posts around their favourite tips, tricks & features of SQL 2014. For the next few weeks, we will be posting a different article from one of our Canadian SQL Server MVPs each week. We hope you enjoy them, please feel free to leave a comment. Thanks to Christian Coté for this week's article!

Over the last few years, we’ve heard a lot about the term self-service BI from all major players on the market. Several tools are available and in this article, I will talk about the various BI categories, their toolset and how they be part of your corporate BI strategy.

This article is not a comprehensive list of all possible features of every tool available in a specific toolset nor is it a tutorial on how to use them.

Different BI category

There are three different categories of business intelligence (BI) available in the Microsoft BI suites. The following sections talk about each of them and list the main pros and cons for each category.

Self-Service BI

Self-service BI is all about giving users or analysts access to different source of data. The source of data can either be a corporate data warehouse and/or any standalone source. The goal of self-service BI leverage the knowledge that users and analyst might have on their data. The assumption is that they know it sometimes better than ETL developers so the tool must help them do their job with minimal intervention of IT. Most of the time, IT help is necessary to setup things like data connections and data source location.

In general, the volume of data used in self-service BI is from small (thousands of rows) to medium (less than 10-20 million rows). The purpose of self-service BI is to give fast access to analyst so the users can report the result as fast as they can. They can also refresh or modify their data from time to time by themselves without having to contact a developer to do it for them.

Toolset

It is well known that analysts and most users are very familiar with Excel. Most of the time, that’s the tool they used well before the advent of new add-ins that help them doing their analysis and reports. Depending of the version of Excel they use, several add-ins are available for it.

  • Excel 2010 professional plus: The only self-service BI tool available with this version is SQL Server 2012 PowerPivot for Excel 2010. It is available as a free download.

  • Excel 2013 professional plus:

    • PowerPivot is integrated into the product. You simply have to enable it in the add-ins options.

    • Power Query for Excel: Provides self-service ETL to help integrate a wide variety of data sources that can be available in the clouds or on premise (locally, not in the clouds). Power Query provides a way to share queries resulting of the ETL applied to the data sources.

    • Power View: Like PowerPivot, this tool in integrated in Excel 2013. It provides easy and rich reporting capabilities.

  • Excel with Office 365 subscription: provides all of the above plus

    • Power Map: a tools that provides mapping of geographical and temporal data in 3D. 

It is very important to know that all the data used for analysis is self-contained in the PowerPivot model in Excel. In fact, PowerPivot creates a small Analysis Service cube in the file. We’ll see later how this cube is used.

PowerPivot connectivity

With PowerPivot, you can easily connect to a wide variety of sources. The following screen shot shows connections available with it.

Power Query Connectivity

Power Query provides connectivity to a wider variety of sources than PowerPivot.

  •   Online: The example below shows how we can quickly query Census data from multiple web sources. The data is appended in the query and can be added to PowerPivot model.

  • From text files: data can be loaded from a variety of text files as well as loading an entire folder that would have file with similar structure.

  • Databases

  • Other sources: a wider variety of “other” sources than PowerPivot capability.

 

Team BI

Once the analyst or user has created is Excel file and done its analysis, the next step is to share it with his or her co-workers. They will look at the file to revise it or do further analysis with it. As stated above, the only problem with self-service BI, as we have seen so far, is that all the data resides in the Excel document. If the user wants to share the file, it has to be copied on either a shared drive on the network or be saved in a shared folder in the clouds.

The stated approach have the following limitations:

  • If the consumer of the file doesn’t have the correct toolset cannot modify the document’s data.

  • No scheduled refresh is possible, the data has to be refreshed manually.

  • No more than one user can open the document since the first one will lock the file.

Team BI allows us to share our data/reports created in using a self-service approach more easily and circumvents the limitations stated above. Basically, team BI provides ways to consume previously created reports or queries in a Portal. As we’ll see in the toolset section, that portal can be hosted either on premise or in the clouds.

Toolset

On premise

SharePoint is the portal you’re going to use if your team BI is used on premise. Once the analyst or user is satisfied with its reports made from PowerPivot, they can upload their Excel workbook in SharePoint PowerPivot gallery. The PowerPivot gallery is the portal that the user will use to access the reports.

Here is sample image of the PowerPivot Gallery:

Here are some other features that SharePoint provides for Team BI:

  • The uploaded PowerPivot model is restored in a SQL Server Analysis Services (SSAS) instance in SharePoint. It allows users to have better performances when they use the reports in Excel Services.

  • Reports: Reporting services (reports and report parts) can be created and run directly from SharePoint instead of Report Manager. The users have access their reports and PowerPivot reports from a centralized location, that is, in the same portal.

  • Power View is also greatly enhanced in SharePoint. You can connect Power View visualizations to your cubes in data warehouse. You can also save your Power View visualizations in PowerPoint; the slides will be able to connect to your cubes if they are available.

There are some limitations if you want to use SharePoint PowerPivot gallery with all toolset available in self-service BI:

  • Only PowerPivot reports are available to be consumed in Excel Services on SharePoint. Excel Services gives an Excel like experience in a browser.

  • Power view and Power Map visualizations as well as Power Query shared queries are not supported (at time of writing this article) in SharePoint 2013.

  • Power view visualizations will have to be re-done in Power view for SharePoint.

With SharePoint, you have the capability to refresh your data based on a schedule. The user can also refresh the data by himself manually. Also, as long as the user has access to SharePoint and security is configured properly, access to the PowerPivot gallery does not require anything else in term of licences.

It’s worth enforcing that in SharePoint 2013 Excel Services, the experience is very similar as if the user would interact with the data directly in Excel.

In the clouds – Power BI

If you have an Office 365 subscription with Power BI, you have basically access as what is called SharePoint online. SharePoint online support the full fledge self-service BI toolset and provides access to a BI portal. Also, there’s a Windows 8 app that is meant to give you access to your Excel reports and Power View visualization reports on the go. Microsoft said that they plan to provide a mobile app for Android and IOS in the future. But at time of writing, there’s only the Windows app available.

Here is a screen shot of the Power BI Windows app.

The SharePoint Online portal gives you more options:

It looks very similar to the SharePoint PowerPivot gallery discussed previously. If we select a specific report, we get several options:

 

Here are the top features of Power BI

  • Data can be refreshed on schedule

  • It has a favorite and featured section, any report can be moved to any of these sections for faster access.

  • It has Q&A feature that enables “human” question to be used to query the report data. Something like “How many people will there be in Canada in 2020”

  • Supports all Power BI toolset.

Here are the limitations:

  • Data is stored in Excel data model (PowerPivot). At time of writing, a single workbook can have a maximum size of 250 MB. It seems small but for most team BI reports, it is sufficient. Remember that the data is compressed in PowerPivot, so 250 MB might be a lot of rows.

  • Automatic data refresh with data on premise is limited to relational sources (Oracle, SQL Server) and some OData and text files. Cubes are not supported yet. For a complete list of supported data sources, there is an excellent white paper on this topic: data management gateways.

Team BI wrap-up

After the analyst or user has created its Excel file with Power Query, Power Pivot, Power View and/or Power Map in self-service BI mode, they upload file in SharePoint be it on premise or online. This way, their work can easily be shared and modified by other member of the team.

Corporate BI

Basically, corporate BI is what can be made available to the whole enterprise employees and/or customers. There we have a data warehouse and one or more cubes. The data loaded in the data warehouse has to be validated and it is considered as official, meaning that there is on one single of the truth. As opposed to self-service BI where, as stated before, users have complete liberty on what can be loaded in their Excel workbook or report.

How can we use Self-Service BI in our Corporate Environments

Finally, the main topic of this article. There are great advantages of using self-service BI and/or team BI in a corporate environment. In corporate BI, when IT is asked to add a new source, even if it’s only for a one-time thing, it takes some time. For good reasons, IT have their schedule and their infrastructure to take care of. Business users or analysts sometime need to have the data loaded and merged with official data in a much faster pace than IT can provide.

Early Prototyping / Data Exploration

It’s always hard to get users requirements before they see their data. By using self-service, users and developers can better understand the data and what they want to do with it. Also, it provides data warehouse architects invaluable hints on how the data will be used. They are able to design data models that will better suits reporting needs.

By including self-service and team BI in the data warehouse lifecycle, team BI reports can be maintained for some time while the development team integrates the data in their database, cubes, reports and dashboards. This remove the pressure on the development team and allow adjustments to the workbook before doing the final integration in the data warehouse.

It also integrates very well in agile methodology because the users are integral part of the development team. They are doing some of the work, which is, creating the first set of report with the development team. Also sprints are better defined since the users and development team have a better of what needs to be done.

Non Repetitive Data Load

It happens sometimes that users wants to merge data that might be used in an ad-hoc manner. Most of the time, this is not a source that will be part of the data warehouse because it will be used for a specific point in time analysis. In the traditional data warehouse development, a developer would load this data into a temporary database table and would provide the connection details to the user. The first thing we know, the user has to modify the source because it doesn’t do exactly what user need to do his analysis or create his reports. Again, the user would ask the developer to do a slight modification. This process can be repeated many times and takes invaluable time and resources for both users and developers.

Other times, users might come back 6 months after the fact to load another version of the one-time source stated above. The developer might not have saved its code (or did not put it under source control) or, worst, the developer has gone and the code is lost.

Again, having self-service and team BI in place would save a lot of time to both users and developers. The developer might help users but most of the work would be done in Excel without interfering too much with developer and user tasks and schedules. And, with team BI, at least, the Excel workbook is saved in SharePoint, not somewhere on a user or developer personal drive.

Power BI and Office 365 and Corporate BI

If the enterprise use Power BI, they can easily setup a BI portal in no time. IT does not have to setup and configure SharePoint servers and integrate in their SharePoint farm. All they have to do is to give access to Power BI site to the users and setup security on different items there.

In Power BI, it is assumed that the user have access to the workbook or don’t have access at all. This is different than corporate BI where we can setup row level security or give access to some parts of the data warehouse or the cubes in a more controlled way.

But, security apart, Power BI gives users access to their data much faster and they can setup sharing in a snap.

Access Corporate Data Warehouse Data from Power BI

As stated before, Power BI workbooks can be refreshed based on a schedule. Data can be refreshed daily or weekly. To be able to access on premise data, Power BI provides a special tool for that, data management gateways. This is a tool that can be installed

It was also mentioned that data gateways do not support cubes for the moment. One reason good reason for that is because PowerPivot is a cube by itself. It can query another cube but it’s much more difficult since it would return one result set that would include both dimension data and measures. We could split the result set afterwards using DAX (PowerPivot query language) but it’s much easier to query the underlying data warehouse. The latter is usually an Oracle or a SQL Server relational database and these sources are supported by the data gateways.

Microsoft is aware of the fact that cubes are usually seen as semantic layer and they contains some level of abstraction over the relational data warehouse such as dynamic ratios calculations. That’s the reason that enhancing the data gateways to support cubes is one of their priorities.

Restore a PowerPivot file in SQL Server Analysis Services (SSAS, on premise)

One way to give access to the PowerPivot data model to a broader audience is to upload the PowerPivot data model into an Analysis Services 2012 or 2014 (tabular) instance on premise. Using SQL Server management studio, a DBA or developer can easily restore a PowerPivot data model to SSAS:

 

The PowerPivot data model structure and data are restored to SSAS and can immediately be used by end users.

This cube can therefore be opened in SQL Server Data Tools (SSDT) to be enhanced and redirect manual data sources (Excel link tables, connection string) to official corporate source. Once IT developers have finished working on the cube, they simply replace the temporary one made from PowerPivot. The process is almost transparent to the end users or report developers.

Wrap-Up

As data warehouse practitioners, we’ve never been able to integrate new data sources fast enough for the business. Despite the fact that our toolset have greatly evolved over the years, business requirements and data sources evolve at a much faster pace than our capabilities have THE data warehouse that would answer all their needs.

That’s where self-service and team BI come to our rescue. They might not be the definitive target IT or data warehouse team would like to have in term of data quality and robustness. But they provide the most important goal of every IT department: give access to business people to their data as fast as possible and see how they use it to be able to integrate it later in our data warehouse if necessary. Because most of the time business users don’t care about technique, architecture or processes. All they want is accessing their data and if possible in an Excel spreadsheet :).

 

Christian Coté is an IT professional with more than 14 years of experience working on business intelligence projects. Before SSIS was released, Christian developed ETL processes using a range of tools on multiple platforms. Christian has been a presenter at a number of conferences and code camps. He currently co-leads the SQL Server PASS chapter in Montreal.

Comments