Azure data factory tempdb usage in data flow

Jogalekar, Mandar 56 Reputation points
2021-06-06T10:34:13.707+00:00

0

I am using data flow to copy large data to sql azure database.

I recieved an error

Operation on target Data flow1 failed: {"StatusCode":"DFExecutorUserError","Message":"Job
failed due to reason: at Sink 'xxxx':
shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: The database 'tempdb' has
reached its size quota. Partition or delete data, drop indexes, or consult the documentation
for possible resolutions.","Details":"at Sink 'xxx':
shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: The database 'tempdb' has
reached its size quota. Partition or delete data, drop indexes, or consult the documentation
for possible resolutions."}
To resolve, i tried creating a different schema in the data flow setting and unchecked "Use tempdb" option and provided with schema name of schema i created.

I continue to get the same error with not much information available about solving this

Azure SQL Database
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,586 questions
0 comments No comments
{count} votes

Accepted answer
  1. HimanshuSinha-msft 19,476 Reputation points Microsoft Employee
    2021-06-09T19:12:12.81+00:00

    Hello @Jogalekar, Mandar ,
    As mentioned by @Alberto Morillo the change which you did is making the the table being created in the userDB as previously to tempDB . If I were you could have tried

    • Upgrade the DB to a higher tier and scale down after the ingestion ( @Alberto Morillo also suggested this ) , the down side is this will increase your azure bill .
    • You can set the Batch size , the challenge is to find the correct batch size , this will be comparatively slower then the above one .

    Thanks
    Himanshu
    Please do consider clicking on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

    103939-image.png

    0 comments No comments

3 additional answers

Sort by: Most helpful
  1. Alberto Morillo 33,696 Reputation points MVP
    2021-06-06T21:07:25.063+00:00

    By default, Data Factory will use a global temporary table to store data as part of the loading process. You can alternatively uncheck the "Use TempDB" option on the Sink transformation as shown here, ask Data Factory to store the temporary holding table in a user database that is located in the database that is being used for this Sink.

    Another possible workaround scale up/down the tier of the database to Premium tier. You can also have ADF to scale up the database prior to that ETL process and when finished, you can have ADF scale down the database.


  2. Jogalekar, Mandar 56 Reputation points
    2021-06-10T07:16:58.337+00:00

    In case anyone face this issue, the only solution which worked for me was to increase my Db Tier to a compute heavy one with a big price.

    I used business critical version with 4 core of azure sql.

    0 comments No comments

  3. Balasubramanian, N (Nagarajan) 21 Reputation points
    2021-06-18T14:10:26.403+00:00

    I am also facing similar issue in Dataflow. As per the documentation, Data Factory will use a global temporary table to store data as part of the loading process. You can alternatively uncheck the "Use TempDB" option on the Sink transformation as shown here, ask Data Factory to store the temporary holding table in a user database that is located in the database that is being used for this Sink.

    I also unchecked the tempdb option in the sink and specified use schema TMP. I see the temp table is created in the TMP schema and monitored the tempdb usage in the server. The query does an upsert of 20 million row to an existing table of 10 million rows.

    1. Why tempDB space is used though I specified the user schema?
    2. I upgraded the sqldb version from Hyperscale Gen5 V2 cores to Hyperscale Gen5 V16 cores and it consumed all 512GB. How do I find the right vcore size for the query ruuning?

    @Poel van der, RE (Ron)
    @MarkKromer-MSFT

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.