Which is the best option for data warehouse / Power BI solution?

System2000 236

Hello,

I am building a data warehouse in Azure from several external data sources. Data can be imported via pipelines in Data Factory (or Azure Synapse). Data is to be stored in Azure. Reporting through Power BI is required.

There seem to be a multitude of methods to achieve this. I could store the data in an Azure SQL database, or use blob storage. I could use an Azure Data Lake. I could use Azure Analysis Service or Azure Synapse Analytics to integrate with Power BI.

For example, as I understand it, I could do something this:
External DBs-->Azure Data Factory-->Azure Blob Storage/Azure Data Lake-->Azure Synapse Analytics-->Power BI Reports

But it might be better (or worse) to do this:
External DBs-->Azure Data Factory-->Azure SQL Database-->Azure Analytics Service-->Power BI Reports

I am looking for advice on why I would use one particular approach rather than the other.

Thanks.

Jack Fields 90 Reputation points

2024-01-31T15:35:11.7033333+00:00
Hi,

I'm not sure if this is going to be related to this question but, the more I read the more I think yeah it's similar.

Our company currently saves there Power BI Reports and Data all in one SharePoint Site however, this is impacting our Azure/Microsoft related budget and SharePoint is exceeding it's storage limit and budget.

We are currently looking for alternatives to storing everything in SharePoint and Azure Data Lake has been banded around.

The questions I have are as follows:

What would the process be for transferring our data/BI Reports from SharePoint to the alternative solution?

If we were to go with Azure Data Lake, what would the cost for Azure Data Lake compared to SharePoint?

Currently have the following information to go on, is this right?

The cost of Azure Data Lake and SharePoint in GBP varies based on several factors such as the amount of data stored, the number of operations performed on the data, and the specific plan chosen. Here’s a general comparison: Azure Data Lake:

Azure Data Lake Storage usage is calculated in binary Gigabytes (GB), where 1 GB = 2^30 bytes1.

The cost starts from £0.10 per unit2.

There are additional costs according to the number of operations performed on the data3.

For the Premium Storage tier, with hierarchical namespaces, LRS redundancy, within the East US Region: Hot— £0.10 / GB / month flat fee, Cool— varies based on the amount of data, Archive— varies based on the amount of data1.

SharePoint:

SharePoint has 3 pricing editions, from £4.10 to £10.304.

The pricing for Microsoft SharePoint starts at £4.10 per user per month4.

SharePoint Online Plan 1 is £4.10 per user per month (Annual subscription—auto renews) and Microsoft 365 Business Standard is £10.30 per user per month (Annual subscription—auto renews)4

Accepted answer

System2000 236 Reputation points

2021-10-28T20:31:34.033+00:00

I'm accepting my own comment as answer to close this question.
Please sign in to rate this answer.
Bernardo Romero 1 Reputation point

2022-06-19T14:14:11.09+00:00

Hi there,

we are currently in a similar scenario, where we are migrating a traditional DWH to a Cloud Environment with Data Lake Gen, Azure Pool, Synapse and Power BI. What did you end-up chossing? serverless or dedicated pool? How has been your experience so far and what would you do differently? many thanks for sharing your thoughts.
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

5 additional answers

System2000 236 Reputation points

2022-06-22T09:22:46.797+00:00

Hi @Bernardo Romero

As there were thousands of tables in the source databases, I decided it would be more cost effective to store them in Azure Data Lake Gen 2 and create a logical data warehouse in Azure Synapse Serverless SQL Pool.

I used Azure Data Factory to ingest data instead of the Azure Synapse pipelines because the latter has some issue with on-premises integration runtimes. I implemented an incremental loading mechanism and stored data (as CSV) something like this:
filesystem/database_name/table_name/YYYY/MM/DD/data.csv
Where YYYY/MM/DD represents the ingestion date.

In the Synapse Serverless SQL pool I created views using 'openrowset' to read the data lake. However, I found that many of these views failed when using inferred data types - so the data types needed to be explicitly defined in the views. Data types such as IMAGE, TEXT, etc, caused failures of the view -- so I needed to exclude these from views. I used | (pipe) and ~ for field terminator and field quotes, as the data contained lots of commas and " marks, which caused failures.

Once the Serverless SQL pool was set up in Azure, it was relatively simple to connect via Power BI. However, the Serverless SQL pool is not ideal for using 'Direct Query' mode in PBi - there were timeouts and performance issues - so Import mode needed to be used.

In a future phase we should define which tables are useful, then clean/curate this data and convert to a dimensional model in Parquet format using CETAS, and consider the partitioning structure in the data lake to optimise query performance.
Please sign in to rate this answer.

0 comments No comments
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

Which is the best option for data warehouse / Power BI solution?

5 additional answers

Your answer