다음을 통해 공유


Data Virtualization Using Power Query

PROBLEM STATEMENT:

   How to use Data Virtualization in Data Integration Layer/ BI Solutions?

Description:

           Managing Multiple Data Integration systems poses below issues using traditional BI Tools:

  1. Increased Hardware and ETL Maintenance cost.
  2. Data Source Migration is difficult and time consuming.
  3. Time to accomplish new report needs is high.
  4. Data purging /archiving, takes away an important history data from User.
  5. User does not have access to Operational data, Real-time Data.

 

SOLUTION:

                    On one hand, our tradition ETL Systems are failing to fulfill Current user demands as shown above. On the other side, Data is growing at a very rate of almost 50% per year. To accommodate this growing data and changing needs of user, there is a need a whole new ETL framework.

One such architecture is Data Virtualization. Definition of Data Virtualization is

“Data virtualization is the process of offering data consumers a data access interface that hides the technical aspects of stored data, such as location, storage structure, API, access language, and storage technology. This abstraction layer hides all the technical aspects of data storage. The applications don’t have to know where all the data has been stored physically, where the database servers run, what the source API and database language is, and so on.”

Technically, data virtualization can be implemented in many different ways. Like Federation, Placing data stores in Cloud, Customized Software abstraction layer etc.

In this paper we will try to achieve Data Virtualization using Microsoft SSAS (Multidimensional system) and much evolving Power Query Formula language.

Getting Stage Ready

 Software prerequisites:

  Visual Studio 2012

  SQL Server Data Tools for 2012

  SQL Analysis Server 2012 (Tabular Mode)

  Power Query (Excel-2013)

  First step towards Data Virtualization is to create a model of Business requirement. A model will consist of:

  • Entities (Direct Physical table/Query Based)
  • Relationships
  • Measures
  • KPIs

Model Creation:

 Scenario: Create 2 tables, Define Measure and show their data by establishing a join between them.

 To create model we are using SSDT tools with below steps.

  • Create a Database as Sample DB and 2 Tables as specified in below attached BIM file.
  • Create a Multidimensional Project in SSDT.
  • Import the Database and provide table/Query selection to create entities.
  • Once entities creation is done, create relationships by drag/drop of mapping columns.
  • Define Measure.
  • Save the Model.

Model will look like below:

In this we have defined measure and create join between 2 tables.

Create Power Query and Connectionstring:

  • Connect to excel and click Power Query.
  • One way to get Power Query script generated is Select Power Query->select DataSource-> Provide Server details and select Database->Table. This will generate basic Power query Script for you.

 

  • Other ways is, Select Blank Query like below Power Query->From Other Sources->Blank Query:

 

 

 

 

  • Give below code. Run it to verify the query. It should bring the expected Data as per scenario.

With this query content we will create .m file which is our Power Query program file:

 

section Section1;

shared Query1 =

let

    Source = Sql.Databases("SQLServer"),

    CONTROL = Source{[Name="SQLDatabase"]}[Data],

    JobRunTracker = CONTROL{[Name="SQLTable"]}[Data],

         Query=Table.SelectColumns(JobRunTracker,{"JobRunID","STMappingID","RecordsCopied","JobRunStatus"})

in

    Query;

 

section Section1;

shared Query2 =

let

    Source = Sql.Databases("SQLServer"),

    CONTROL = Source{[Name="SQLDatabase"]}[Data],

    JobRunTracker = CONTROL{[Name="STMapping"]}[Data], Query=Table.SelectColumns(JobRunTracker,{"STMappingID","SourceTableName","SourceDBName"})

in

    Query;

 

  • Generate Connection String for .m files using System.IO.Packaging and System.Net.Mime. Below is the sample code for the same:

 

using (Package package = Package.Open(memoryStream, FileMode.Create))

{

    Uri formularUri = PackUriHelper.CreatePartUri(

                 new Uri(@"Section1.m", UriKind.Relative));

    PackagePart formulaPart = package.CreatePart(formularUri,

                 MediaTypeNames.Text.Plain);

    using (Stream formulaStream = formulaPart.GetStream())

        using (var streamWriter = new StreamWriter(formulaStream))

         {

            streamWriter.Write(mProgramContents);

                }

 }

 packageContents = memoryStream.ToArray();

 string mashup = Convert.ToBase64String(packageContents);

 

  • Replace the Connection strings in <DataSource> element of in the Alter/Create script of the Tabular Cube for the required entity with the above generated connection string.

 

<DataSource xsi:type="RelationalDataSource">

    <ID>[SOME GUID]</ID>

    <Name>[NAME]/Name>

    <Annotations>

        <Annotation>

            <Name>ConnectionEditUISource</Name>

            <Value>SqlServer</Value>

        </Annotation>

    </Annotations>

    <ConnectionString>location=Query1; mashup=UEsDBA ... AAAAA==</ConnectionString>

    <ImpersonationInfo>

        <ImpersonationMode>ImpersonateServiceAccount</ImpersonationMode>

    </ImpersonationInfo>

    <ManagedProvider>Microsoft.Data.Mashup</ManagedProvider>

    <Timeout>PT0S</Timeout>

</DataSource>

Also see the highlighted tag, that should be added to enable SSAS to understand that this Data Source is Power Query Source.

DATASOURCES:

**                ** Power Query can accommodate a variety of DataSources like :

  • SQL Server
  •  Non-SQL :Flat files,Xml files
  • Windows Azure
  • Windows HDInsight
  • OData Feed
  • Active Directory
  • etc.

On authentication side, Windows authentication is supported with Data Privacy option through Excel.

TRANSFORMATIONs:

  So what we want in Power Query Model is, provide details of Data Source and Provide means of Data Transformation that can be served to User.

In the above Model we have used below transformations:

  • Join 2 Tables
  • Getting Selected Columns
  • Getting Selected Rows.
  • Rename of Column
  • Splitting Columns.
  • Combine 2 Columns
  • Perform Aggregate functions,
  • Conversion Functions
  • Date functions.

 

Many more transformations can be found in Power Query Formula language specifications here.

Deploy and Process the Model: Deploy the model and Process it using SSMS, connecting Cube project.

Define Security: Tabular/Cube project support Role based security. The same can be used for this  Model also.

Once model is deployed, Connect using regular excel pivot options Or Cube Directly.

** **

OBSERVATIONS:

Highlights

  • Easy to use for simple metric calculation and advantage of In-Memory feature.
  • Have support for Wide range of transformations.
  • Provide connectivity to different sources like Hadoop, Azure etc.
  • Start with good language support.

Limitations

  • Multiple Virtual entities/Power Query Output are not supported for now.
  • Small and Medium scale application are supported for now.

CONCLUSION:

Also what we have seen is association of Power Query Language with SQL Server Analysis Services. There is upcoming technology for In Memory SQL Server Database with SQL Server 2014(More on this can be found on related White Papers).

To conclude, Power Query is one of the In-Memory technology are coming up which are setting the stage for Data Virtualization, so start exploring.