Azure Data Factory : Copy Acitivity from Azure databricks to SQL server very slow

james_sm 1 Reputation point
2022-10-14T10:43:23.16+00:00

I have a ADF Pipeline which loads data from Azure Databricks using Delta Lake as source to SQL Server as destination. I am using Copy Actvity by using databricks sql query to pull the data. The same query when run in Databricks cluster takes seconds while ADF using self-hoster IR takes ~minutes to copy. Can someone help here? any quick tips to optimize the performance.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Shivam Kumar 541 Reputation points
    2022-10-14T23:18:16.417+00:00

    The table in sql server where you are inserting the data does it have lots of indexes? What is the recovery model for this database on sql server?
    When there are lots of indexes on table the inserts and updates can slow down , also when the recovery model of sql server database is full all the transactions gets fully logged in transaction logs.
    Check for the Degree of copy Parallelism setting of Copy Data Activity and see if tuning it changes anything.
    Here is a guide for Copy activity performance and scalability
    https://learn.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime?tabs=data-factory#scale-considerations


  2. ShaikMaheer-MSFT 38,546 Reputation points Microsoft Employee Moderator
    2022-10-17T08:17:28.57+00:00

    Hi @james_sm ,

    Thank you for posting query in Microsoft Q&A Platform.

    In your case, Sink is SQL server, So it will not match Direct Copy from delta lake to Sink criteria. Hence Copy activity takes data to Stage first and from there to Sink. So you might observe slowness because of this.

    Click here to know the criteria for direct copy from delta lake to sink using copy activity.

    I would encourage you to go through below documentation to understand how Copy activity behaves while copying data from Delta Lake to sink storages.
    Copy data to and from Azure Databricks Delta Lake using Azure Data Factory or Azure Synapse Analytics

    Consider specifying cluster configuration as mentioned in this document.

    Hope this helps. Please let me know if any further queries.

    -------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.