Flatten Complex Nested JSON (not array) Structure in ADF

TIANSHU LIN 20 Reputation points
2023-05-25T02:39:07.7866667+00:00

I'm trying to flatten the following JSON structure in Azure Data Factory, however so nested JSON is not an array hence flatten transformation doesn't work. New to JSON so any help or suggestions would be much appreciated. I would expect the result like this.

Entity TypeName Attributes.qualifiedName Attributes.DisplayName Classification

123 abc abc abc abc

234 abc def def def

344 def def def def

135 abc abc abc abc


{
    "Entities": {
        "123": {
            "typeName": "abc",
            "attributes": {
                "qualifiedName": "abc",
                "displayName": "abc"
            },
            "classifications": [
                {
                    "Name": "abc"
                }
            ],
            "labels": []
        },
        "234": {
			"typeName": "abc",
			"attributes": {
				"qualifiedName": "def",
				"displayName": "def"
			},
			"classifications": [
				{
				"Name": "def"
				}
			],
			"labels": []	
    	},
        "345": {
			"typeName": "def",
			"attributes": {
				"qualifiedName": "def",
				"displayName": "def"
			},
			"classifications": [
				{
				"Name": "def"
				}
			],
			"labels": []	
    	},
        "135": {
            "typeName": "abc",
            "attributes": {
                "qualifiedName": "abc",
                "displayName": "abc"
            },
            "classifications": [
                {
                    "Name": "abc"
                }
            ],
            "labels": []
		}
}
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,100 questions
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 33,986 Reputation points Microsoft Employee
    2023-05-29T08:57:14.3566667+00:00

    Hi TIANSHU LIN ,

    Welcome to Microsoft Q&A platform and thanks for posting your question here.

    As I understand your query, you are trying to flatten the above json using flatten transformation in mapping dataflow , however, the 'entities' property is not an array which makes it difficult to flatten.

    You can use aggregate transformation , keep 'group by' tab as blank and in 'aggregate tab' use collect function for 'entity' property to convert the json into array. You can then use the same to flatten the data. However, when I tried to repro your case , it looks like each of the jsons present in entities are having different key names ie. '123','234' etc . Although the json is valid, however, it doesn't have a fixed schema. So, what you are trying to achieve as the output dataset is not possible via mapping dataflow.

    User's image

    User's image

    I would encourage you to write your own custom code using C#,JAVA or python and execute the same via custom activity in ADF.

    Hope it helps. Please accept the answer by clicking on Accept answer button


2 additional answers

Sort by: Most helpful
  1. Sina Salam 15,011 Reputation points
    2023-05-25T10:11:39.11+00:00

    Hi @TIANSHU LIN Welcome to Microsoft Q&A and thank you for posting your questions here!

    To understand your question, you would like to flatten Complex Nested JSON structure that is not an Array in Azure Data Factory.

    Most of all, to flattens a complex JSON object, you can use the JSON Flatten tool to normalize and flatten complex JSON objects by key name. The object can be ordered alphabetically, either ascending or descending (reversed). The online utility can be used for unflattening or flattening deeply nested JSON objects.

    You can also use the flatten_json package which can be installed with pip install flatten-json. This package provides a function that takes a nested JSON object and returns a flattened dictionary.

    You can use the two links below to see an examples from similar question here:

    https://www.coderstool.com/flatten-json

    https://stackoverflow.com/questions/58442723/how-to-flatten-a-nested-json-recursively-with-flatten-json

    Now, to flattens a complex nested JSON structure that is not an array in Azure Data Factory, you can use the Select activity and delete all nested arrays. Then, for each nested array, use a select activity and select the nested data, not the array itself.

    Similar question has been answered in the past on this platform, you can read more here: https://learn.microsoft.com/en-us/answers/questions/931931/how-to-flatten-a-complex-nested-json-structure-in

    You can also use Python to flatten multilevel/nested JSON. You can use the flatten_json library and concatenate keys (e.g. with dot as separator) like {‘a’: {‘b’: 1}} -> {‘a.b’: 1}

    There is a resource from Microsoft Document Library for Flatten transformation in mapping data flow in Azure Data Factory and many others from this link: https://learn.microsoft.com/en-us/azure/data-factory/data-flow-flatten

    I hope this is helpful.

    Regards,

    Sina

    0 comments No comments

  2. TIANSHU LIN 20 Reputation points
    2023-06-10T08:39:52.9833333+00:00

    used Python to successfully convert the JSON to csv in seconds.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.