Extract elements in an array of complex object in Azure Data Factory dataflow

Zixuan Li 46 Reputation points Microsoft Employee
2023-07-21T01:44:14.5266667+00:00

Hello, in a json document there's a field items, which is an array of complex object

{
	"id": 1234,
	"items": [
        {
            "id": 1,
            "name": "A",
            "owners": [
                "Bob",
                "Steve"
            ]
        },
        {
            "id": 2,
            "name": "B",
            "owners": [
                "Mike"
            ]
        },
        {
            "id": 3,
            "name": "A",
            "owners": [
                "Bob",
				"Mike"
            ]
        }
    ]
}

In the ADF dataflow, I would like to extract items.name and items.owners into separate arrays and distinct the values in the new arrays. So in the output, there will be two new array fields:

{
	"id": 1234,
	"itemNames": ["A", "B"],
	"itemOwners": ["Bob", "Steve", "Mike"]
}

I tried the flatten activity in dataflow but it unrolls the items object into individual rows, which is not what I expect. I also tried to add mapping in DerivedColumns activity as the screenshot, but items.name turns out to be a string instead of an array. What would be the best way to exact all the item names inside the items array? Any suggestions would be appreciated! Thank you!

User's image

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
0 comments No comments
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator
    2023-07-21T17:38:49.04+00:00

    Hi Zixuan Li ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your question here.

    As per my understanding , you want to transform the json data using mapping dataflow. Please let me know if that is not the case.

    To achieve the desired result, you just need a flatten transformation and an aggregate transformation.

    • First of all, in source transformation , select document form as 'array of documents' under json settings in source option.

    User's image

    • Now, use a flatten transformation to convert the json data into tabular format

    User's image

    User's image

    • Use aggregate transformation to group by 'id' and in aggregates tab, create two columns 'itemNames' : collectUnique(items) and 'itemOwners': collectUnique(owners)

    User's image

    User's image

    • Now load the data using sink transformation in json file.

    Look at the output file: User's image

    Hope it helps. Please accept the answer if it's helpful. Thankyou


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.