how ADF connection to Azure Delta Lake ?

shane want 0 Reputation points
2024-07-22T06:14:23.61+00:00

Are there any plans to provide connection between ADF v2/Managing Data Flow and Azure Delta Lake? It would be great new source and sync for ADF pipeline and Managing Data Flows to provide full ETL/ELT CDC capabilities to simplify complex lambda data warehouse architecture requirements.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,844 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA 90,226 Reputation points
    2024-07-22T10:20:15.9933333+00:00

    @shane want - Thanks for the question and using MS Q&A platform.

    zure Data Factory supports Delta Lake through the Azure Databricks Delta Lake connector. You can use the Copy activity to copy data from any supported source data store to Azure Databricks delta lake table, and from delta lake table to any supported sink data store. It leverages your Databricks cluster to perform the data movement. Additionally, Mapping Data Flow supports generic Delta format on Azure Storage as source and sink to read and write Delta files for code-free ETL, and runs on managed Azure Integration Runtime. Finally, Databricks activities support orchestrating your code-centric ETL or machine learning workload on top of delta lake.

    To use the Delta format connector, you need to specify the following properties in the Source options tab:

    • Format: Format must be delta
    • File system: The container/file system of the Delta Lake
    • Folder path: The directory of the Delta Lake

    You can also specify optional properties such as Compression type and Compression level.

    For more information on using the Delta format connector in Azure Data Factory, please refer to the following documentation: https://docs.microsoft.com/en-us/azure/data-factory/format-delta

    Please note that you need to set up a cluster in Azure Databricks to use this Azure Databricks Delta Lake connector. To copy data to delta lake, Copy activity invokes Azure Databricks cluster to read data from an Azure Storage, which is either your original source or a staging area to where the service firstly writes the source data via built-in staged copy. To copy data from delta lake, Copy activity invokes Azure Databricks cluster to write data to an Azure Storage, which is either your original sink or a staging area from where the service continues to write data to final sink via built-in staged copy.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.