Load data from Delta Lake to Synapse

Muruga MuthuKrishnan 26 Reputation points
2023-02-23T14:08:02.2433333+00:00

Hi Team,

We are using Synapse Analytics, in ADLS we are storing the files in Delta format. Now we need to load the data which is in Delta format from ADLS to Synapse. We need to know which is the optimal, performant & cost effective approach to load the data in Delta format from ADLS to Synapse. Please share associated artifacts or tutorials for the same.

If we have multiple approaches then we need to provide Pros & Cons of each approach.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,375 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator
    2023-02-28T20:05:50.2766667+00:00

    Hello @Anonymous ,

    There are multiple methods to load Delta format data from ADLS to Synapse. The optimal, performant, and cost-effective approach to load Delta format data from ADLS to Synapse depends on your specific requirements and use case.

    Here are some of them.

    PolyBase External Tables: PolyBase External Tables allow you to create an external table in Synapse using PolyBase to connect to the Delta files in ADLS. You can then query the external table in Synapse using T-SQL statements.

    This approach is cost-effective since it doesn't require any data movement. However, performance may be slower compared to other options since the data is read directly from the ADLS files.

    Pros:

    • No data movement, which can reduce costs
    • Simple to set up and use
    • Suitable for simple use cases where performance is not critical

    Cons:

    • It may not be as performant as other options since the data is read directly from ADLS files
    • Limited functionality compared to other options

    Reference document: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/load-data-overview

    Azure Data Factory: Azure Data Factory can be used to copy data from ADLS to Synapse. ADF provides a Delta format connector that you can use to copy data in Delta format from ADLS to Synapse.

    This approach allows for more flexibility since you can transform and clean the data during the copy process. However, it involves data movement, which can increase costs.

    Pros:

    • Flexible and customizable
    • Suitable for complex use cases with large amounts of data
    • Can transform and clean data during the copy process

    Cons:

    • Involves data movement, which can increase costs
    • May have higher latency compared to other options

    Reference document: https://learn.microsoft.com/en-us/azure/data-factory/tutorial-data-flow-delta-lake

    Azure Databricks: Azure Databricks can be used to load data from Delta files in ADLS to Synapse. This involves using the Azure Synapse Spark connector to write data to Synapse.

    This highly performative approach allows for transformations and data processing using Spark. However, it may be more expensive compared to other options since it involves running a Databricks cluster.

    Pros:

    • Highly performant
    • Allows for complex transformations and processing of data using Spark
    • Suitable for large-scale data processing

    Cons:

    • More expensive compared to other options since it involves running a Databricks cluster
    • It may have a steeper learning curve compared to other options

    Reference document: https://learn.microsoft.com/en-us/azure/databricks/delta/

    Azure Synapse Studio: Azure Synapse Studio is an integrated workspace that provides a unified experience for developing and managing big data and data warehousing solutions. Azure Synapse Studio can load data in Delta format from ADLS to Synapse.

    You can either use the built-in data ingestion tools or write custom code to load the data. This approach provides a user-friendly interface and allows for customization, but it may not be as performant as using Azure Databricks.

    Each method has its own pros and cons, and the optimal method depends on your specific use case and requirements.

    Pros:

    • User-friendly interface
    • Allows for customization
    • Suitable for simple use cases

    Cons:

    • It may not be as performant as other options
    • Limited functionality compared to other options

    Reference document: https://learn.microsoft.com/en-us/azure/synapse-analytics/quickstart-load-studio-sql-pool

    I hope this helps. Please let me know if you have any further questions,

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.