How to check or see size of cosmos DB document or is there any limit on loading of Azure DWH from cosmos DB

Shailendra Kad 11 Reputation points
2022-11-28T15:56:29.823+00:00

0

I am asking this question due to some issues I am facing lately. when I am trying to insert the data from cosmos to Azure DWH , it is inserting well for most of the databases but for some it is giving some strange issues. Later we found out that it is due to the size of the Cosmos DB document. Like we have 75GB of size of one of our cosmos DB. Then if we are trying to insert all the data in initial load , it gives Null Pointer error. But if we try to limit the rows say , first 3000 and then increment the count of records by 3000 then it is able to insert but it takes significant amount of time. Also, this is our ACC data , we are not sure of our PRD data. and now for some of the DBs we need to set it to 50000 rows per load and for some we have set 3000(like for above example). So to load the data iterative way is the only solution ? or is there any other way? Also, how can we determine the incremental value to load in each iteration for new DBs to be added? P.S. I also tried increasing DWUs and IR cores to maximum but no luck.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,378 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,909 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,644 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,546 Reputation points Microsoft Employee Moderator
    2022-11-30T06:02:55.137+00:00

    Hi @Anonymous ,

    Thank you for posting query in Microsoft Q&A Platform.

    I hope we are using ADF pipeline here to load data from cosmos DB to Azure DWH.

    If you are facing issue with single load then yes processing data in batches with incremental iterative approach is good idea. You can try increasing set of records to process from 3000 to more and see if that helps for quicker processing.

    To me its little confusing the error which you are getting, error is not saying size is too large or something its saying null pointer error. So i doubt is really the size concern here or anything else.

    Could you please try skipping incompatible rows settings in copy activity and see how it behaves?

    Hope this helps. Please let me know how it goes.

    -------
    Please consider hitting Accept Answer button. Accepted answers help community as well.


  2. Shailendra Kad 11 Reputation points
    2022-12-08T08:40:40.207+00:00

    Hi @ShaikMaheer-MSFT ,
    I also thought there are incompatible rows. But when did analysis, we got to know that after 6190 rows , I am getitng an error.
    Once I put limit of 6191 rows to read, it gives NullPointer Error.
    So, we tried to load only that row and it got loaded.
    So, after a response from one of microsoft support memmber, I implemented incremental logic to load rows in iterative way.
    There are 290k rows. so, once I put the limit of 6k , in second iteration I got that error.
    then I reduced the limit to 3000. then I got error at 8th ot 9th iteration, means after 24000 rows loaded.
    So, I set it to 1000 and then I could load all the rows. But I started the load 3 days back and it is still loading as we have 290k records.
    If unclear, then please let me know , we can also have a teams call to discuss further.

    It is very complex and difficult to explain everything in writting.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.