Read Delta Tables from ADLS Gen2 in Power BI: Exploring Direct Access Methods

Akshay Patel 45 Reputation points
2024-08-23T13:46:12.77+00:00

As part of our data pipeline, I'm using Azure Blob Storage as the source, Azure Data Factory (ADF) for processing, and Azure Data Lake Storage Gen2 (ADLS Gen2) as the destination.

Here's the process I've followed:

Step 1: I read a CSV file from Azure Blob Storage and sink it as a Delta table in ADLS Gen2. The Delta table is created with an inline dataset type (Delta) and is stored in the path container/delta/sales. Upon execution, this creates a delta folder in ADLS Gen2, which includes a log folder and a Parquet file. A new Parquet file is generated each time the process runs.

Screenshot 2024-08-23 185216

Step 2: I tested reading this Delta table in ADF by setting the folder path as the source using data flow. ADF successfully reads the data as a Delta table.

Screenshot 2024-08-23 185501 Screenshot 2024-08-23 185520

Step 3: My final goal is to use this Delta table in Power BI. However, when I attempt to load the data using Power Query by specifying the ADLS Gen2 folder path (Read data from delta table from Azure gen 2 storage (linkedin.com)), Power BI returns a list of files in the folder rather than reading it as a Delta table.

Question:
Given this setup, do we require an intermediary, such as a Spark cluster, to read the data as a Delta table in Power Bi? Is there a way to directly read the Delta table from ADLS Gen2 in Power BI without additional infrastructure?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,466 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,679 questions
{count} vote

Accepted answer
  1. Amira Bedhiafi 24,451 Reputation points
    2024-08-24T13:00:11.1266667+00:00

    When you connect Power BI to a folder in ADLS Gen2, it lists all the files in the folder rather than interpreting them as a Delta table. Power BI can read Parquet files directly, but it doesn't understand the Delta Lake format and its transaction logs.

    Delta Lake is a storage layer that brings ACID (Atomicity, Consistency, Isolation, Durability) transactions to Apache Spark and big data workloads. Reading a Delta table requires understanding both the Parquet files and the Delta transaction log, which Power BI doesn't natively support.

    You could use Azure Synapse Analytics or Azure Databricks to read the Delta table, convert it to a format Power BI can consume, and then export it to a location that Power BI can access (another set of Parquet files, a CSV, or direct connection via an SQL endpoint).

    Power BI has a built-in connector for Azure Databricks. You can load your Delta table into a Databricks cluster and then use Power BI to query the data through the Databricks connector.

    If your Delta table doesn't require the ACID transaction features, you could directly read the Parquet files from the Delta table folder in Power BI. However, this method ignores the Delta Lake transaction log and will only give you the latest snapshot of the data in the Parquet files.

    You could add a step in Azure Data Factory to transform the Delta table into a flat table (Parquet or CSV) that Power BI can read directly.

    4 people found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.