ADF Copy activity taking time to load from Blob to Cosmos DB

Devashish Parmar (BLR GSS) 86 Reputation points
2023-01-11T07:01:30.87+00:00

I am loading .json files from Azure Blob storage to Cosmos DB however it is taking me 1.5 hrs to do the entire load. Can you suggest how to improve the performance?

Below are the details.

No. of .json file : 1

No. of json records : 3L

Size of .json file : 300 MB

Time taken to load : 1.5 hrs

Below is the 3 steps I have in my pipeline and as you can see the 3rd step is 'Blob to CosmosDB' which takes 1.5 hrs.

User's image

Below is the statistics for the 3rd step.

User's image

even the location of Blob storage and Cosmos DB is same as 'Centras US'

can someone help me here?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
0 comments No comments
{count} votes

Answer accepted by question author
  1. KranthiPakala-MSFT 46,737 Reputation points Microsoft Employee Moderator
    2023-01-19T01:16:43.3866667+00:00

    Hi @Devashish Parmar (BLR GSS),

    Thanks for using Microsoft Q&A forum and posting your query.

    Seems like the images aren't available on the original query. Could you please help update them so that it would help us understand the problem better and assist accordingly.

    Please try below recommendations and see if that helps to improve copy performance:

    Try either of the following two solutions:

    • Increase the container RUs number to a greater value in Azure Cosmos DB. This solution will improve the copy activity performance, but it will incur more cost in Azure Cosmos DB.
    • Decrease writeBatchSize to a lesser value, such as 1000, and decrease parallelCopies to a lesser value, such as 1. This solution will reduce copy run performance, but it won't incur more cost in Azure Cosmos DB.

    In addition, highly recommend to go through this Data migration video from product team, which walks you through the steps of copying data from Azure Blob storage to Azure Cosmos DB using ADF and also describes performance-tuning considerations for ingesting data to Azure Cosmos DB in general - Data Migration: Azure Blob Storage to Azure Cosmos DB using Azure Data Factory

    Additional info: I recommend reading through this existing SO thread and see if that helps to mitigate your issue: https://stackoverflow.com/questions/71443914/azure-data-factory-copy-to-cosmosdb-throttling

    Hope this info helps. Let us know how it goes.

    Thank you.


    Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.


1 additional answer

Sort by: Most helpful
  1. Silvia Wibowo 6,071 Reputation points Microsoft Employee Volunteer Moderator
    2023-01-11T19:15:14.3833333+00:00

    Hi @Devashish Parmar (BLR GSS)

    It may help to use other format (not JSON), for example Parquet or ORC. Typically for data processing, JSON has the worst performance in terms of size and processing time.

    Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.