Share via

Copy data from cosmos db (complex objects) to datalake gen 2 flattened structure (CSV or TSV format)

Prasad Nikumbh 1 Reputation point
2021-06-07T14:47:40.947+00:00

I am using Azure data factory(ADF) Pipeline-Copy Activity to copy data from cosmos db to datalake.I have created linkedservice for source (Cosmos db ) and destination (Data lake gen 2- Delimited format (.tsv))as I want my output in tsv format or csv..
which is working fine for me for simple objects.
But,I want some suggestion for copying complex objects from cosmos db to flattened tsv format.

eg:-> Source :->
{
"id":"30b00315-bae9343442333",
"TestId":"12345",
"Information":
[
{"Info":
[
{"InfoID":1,"result":"4"},
{"InfoID":2,"result":"3"},
{"InfoID":3,"result":"3"},
{"InfoID":4,"result":"4"},
]

"lastModified":"2021-01-2T08:36:09.4961916",
"lastModifiedBy":"SYSTEM",
"flag":1
}
],
"created":"2021-01-12T08:36:09.49561916",
"createdBy":"SYSTEM",
"isNew":true,
"eTag":"\"6100478c-0000-0700-342343440000\""
}

Destination ouput should be in:->
Id TestId InfoId Result LastModified LastModifiedBy CreatedBy flag
30b00315-bae9343442333 12345 1 4 2021-01-2 System 2021-01-2 1
30b00315-bae9343442333 12345 2 3 2021-01-2 System 2021-01-2 1
30b00315-bae9343442333 12345 3 3 2021-01-2 System 2021-01-2 1
30b00315-bae9343442333 12345 4 4 2021-01-2 System 2021-01-2 1

As you can see above in source,Information is array and inside it,there is another field Info which is another array.I want this complex nested arrays in flattened structure.
How we can achieve it? Please suggest some way.

Thanks in advance.

I have tried below options
1.Copy Activity Mapping schema :- But it will take only first index.
2.Data flow activity But it wont work for nested array objects.Let me know if i am missing something from my end.

Azure Cosmos DB
Azure Cosmos DB

An Azure NoSQL database service for app development.

Azure Data Factory
Azure Data Factory

An Azure service for ingesting, preparing, and transforming data at scale.


4 answers

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,822 Reputation points Microsoft Employee Moderator
    2021-06-15T18:07:43.633+00:00

    Hi @Prasad Nikumbh ,

    Thanks for sharing the alternate solution details.
    As mentioned by @MarkKromer-MSFT , you can use Flatten transformation to achieve your requirement.
    I did tried using the source data you have provided and was able to flatten using the flatten transformation.

    Here is the source data used for testing: 105845-nestjsonarraysource.txt

    105859-image.png

    Then add flatten and unroll by deepest array, then click Reset input columns, it will display the nested array columns unrolled. Then click Add mapping and include the input columns from source that you would want in the sink.

    105904-image.png

    Then do data preview of the flatten transformation to check the output.

    105905-image.png

    Here is the GIF of implementation:

    105846-flattentransformationcomplexjson.gif

    Hope this helps.

    ----------

    Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.

    Was this answer helpful?


  2. Prasad Nikumbh 1 Reputation point
    2021-06-14T12:27:01.037+00:00

    @MarkKromer-MSFT For now, If i want to flatten i can do it from source itself i.e. I can use join and write custom query (Cosmos) in source and then can push the same flatten data to datalake. Is there any way or feature by which complex objects directly gets flattened and can push to datalake or whatever way we want.(If not, Please let us know if any new feature related to this coming in future).

    Was this answer helpful?

    0 comments No comments

  3. MarkKromer-MSFT 5,231 Reputation points Microsoft Employee Moderator
    2021-06-08T17:44:52.707+00:00

    Today, you would have to start with the deepest array and unroll each individually. However, we have a new "deep traversal" feature landing soon that should help with this.

    Was this answer helpful?


  4. Stefano Peroli 6 Reputation points
    2021-06-07T20:25:53.193+00:00

    Hi @Prasad Nikumbh ,
    I think that you could use the ADF Data flow Flatten transformation.
    Here an example https://www.youtube.com/watch?v=VY2tFQJoAXE

    Hope it helps.

    — Remeber to accept answer if it solves you question —

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.