Load data into synapse delta lake from Azure SQL database using synapse pipeline

heta desai 252 Reputation points
2022-07-11T12:36:31.323+00:00

I want to load data from azure sql database to delta lake table using synapse pipeline. I tried using copy activity but there is no target type available to load data into delta lake directly. So, using copy activity I am reading data from azure sql and writing them into csv file in Azure data lake storage. Then using notebook I am writing data from csv to delta lake table. all the steps are performing using pipeline.

I am using metadata driven approach to pull the multiple tables from source so single pipeline will be used. once metadata is read, writing to csv and writing to delta lake using notebook will be inside the for each loop. The performance is too poor for this approach.

I am not sure about the approach I am using for the requirement. I need to improve the performance of the pipeline.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,877 questions
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 32,821 Reputation points Microsoft Employee
    2022-07-12T16:29:20.587+00:00

    Hi @Anonymous ,

    Thank you for posting query in Microsoft Q&A Platform.
    Delta format is not available in copy activity. Hence you are not able to see option.

    In your case, I would suggest you use dataflows directly to take data from azure sql database to delta. Delta format is available in data flows as inline datasets. Click here to know more about it.

    Please note, data flows also works on spark. In above approach we dont need to have middle layer of taking tables data to files and then loading to destination.

    Below link may come handy to understand various tips to follow for better performance with data flows.
    https://learn.microsoft.com/en-us/azure/data-factory/concepts-data-flow-performance

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. AaronHughes 391 Reputation points
    2022-07-12T20:40:35.283+00:00

    Not supported in Pipelines yet

    if you want to work with delta in this way you would need to use the SparkPool and code like Python to interact with the delta in adlsv2


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.