question

JithinMV-2056 avatar image
0 Votes"
JithinMV-2056 asked JithinMV-2056 edited

How to efficiently migrate MongoDb to azure CosmosDB with azure Databricks?

While searching for a service to migrate our on-premise MongoDB to Azure CosmosDB with Mongo API, We came across the service Azure Data Bricks. We have total of 186GB of data. which we need to migrate to CosmosDB with less downtime as possible. How can we improve the data transfer rate for that. If someone can give some insights to this bigdata related spark based PaaS provided by Azure, It will be very much helpful. Thank you

azure-databricksazure-cosmos-dbazure-database-migration
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

SaurabhSharma-msft avatar image
1 Vote"
SaurabhSharma-msft answered JithinMV-2056 edited

Hi @jithinmv-2056,

Thanks for using Microsoft Q&A !!
Your performance to migrate data will depend on few factors and can be modified using the below configurations -
1. Number of workers and cores in the spark cluster
2. maxBatchSize
3. MongoDB Spark practitioner and partition key

So, in order to increase the data transfer rate you need to adjust number of workers to execute tasks and maxBatchSize will help you control the rate at which data will be saved to Azure Cosmos DB. Also, you need to disable indexes during data transfer to improve your data transfer rate more.

Please refer to the Optimize the migration performance for details.
Please let me know if you have any additional questions.

Thanks
Saurabh

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @jithinmv-2056,
Please let me know if you have any other questions.

Thanks
Saurabh

0 Votes 0 ·

Hi @jithinmv-2056,
I have not heard back from you. Did my answer solve your issue? If so, please mark as accepted answer. If not, please let me know how I may better assist.

Thanks
Saurabh

0 Votes 0 ·

Sorry for the late reply. Based on your answer I have further questions

  1. How to disable indices while doing the data migration?

  2. I am planning to migrate to a serverless CosmosDB (It have a 5000Ru/s limit per container)
    So we can not increase the RU further to increase the write frequency

  3. As I said I have 186GB of data. I have decided I will migrate all of it with Azure Data Bricks

  4. Downtime doesn't matter. My concern now is, While looking under the Azure Data Bricks Service. I came across a lot different plans. Can you please tell me which plan might be suitable for me and what is the maximum number of workers needed in a cluster?

I am using the following python script in the link to do the job. migrate-databricks
Planning to have 36 notebooks as I have 36 collections
Thank you so much for helping :)


0 Votes 0 ·