How to read nested json schema for moving data to different sql tables?

Anuganti Suresh 200 Reputation points
2024-02-28T13:31:30.82+00:00

I have Nested JSON and wants move data to Azure SQL tables as below:

  1. persons
  2. role
  3. columns_partition
  4. columns_base
  5. columns_total
  6. columns_totimestandard
Pipeline created as:

Lookup --> ForEach (Switch) --> Stored Procedure

Switch activity has added with 6 data flows:

  - CASE 1: df_persons
  
  - CASE 2: df_role
    
  - CASE 3: df_columns_partition
    
  - CASE 4: df_columns_base
    
  - CASE 5: df_columns_total
    
  - CASE 6: df_columns_totimestandard).


Each data flow has below sequence of activities:


Source --> Flatten --> Filter --> Derived Column --> Sink

Query: What is the best way to create dataset for Source (dataflow) with unique dataset/individual dataset? Since dataset schema reads only first object of array.

https://learn.microsoft.com/en-us/answers/questions/1518229/is-there-limitation-for-nested-json-to-show-data-t?ns-enrollment-type=Collection&ns-enrollment-id=7eqhq5kz2mnqp Nested JSON:

{

"code": "SUCCESS",
"dataflow": {
    "input": [
        {
            "persons": {
                "Type": "person",
                "id": "12345",
                "user_id": "abcd",
                "email": "******@hotmail.com",
                "type": "Internal",
                "name": "hari",
                "role": [
                    {
                        "role_sno": 11,
                        "role_name": "personal",
                        "role_type": "confidential"
                    }
                ],
                "is_access": true,
                "display_id": "",
                "delet": false,
                "ability": false
            },
            "columns": [
                {
                    "Type": "partition",
                    "group_id": "MANAGEMENT",
                    "type_id": "UPDATED",
                    "public_ip": "0.1.1.4",
                    "version": "05",
                    "platform": "net",
                    "browse": "google",
                    "model": "nxt",
                    "time": 1673999150117,
                    "modified": "2024-01-16T14:25:50Z",
                    "partition": {
                        "online": true
                    },
                    "partition_data": {
                        "message": "",
                        "status": false,
                        "start_time": 0,
                        "end_time": 0,
                        "online": true,
                        "in_meet": false
                    }
                },
                {
                    "Type": "base",
                    "group_id": "id",
                    "type_id": "LOGGED",
                    "public_ip": "199.166.14.76",
                    "version": "50",
                    "platform": "quera",
                    "browse": "explorer",
                    "model": "mac",
                    "time": 1673987691319,
                    "modified": "2023-01-17T20:34:51Z"
                },
                {
                    "Type": "total",
                    "group_id": "id",
                    "type_id": "LOGGED",
                    "public_ip": "1.16.11.12",
                    "version": "50",
                    "platform": "quera",
                    "browse": "explorer",
                    "model": "mac",
                    "time": 1673987691319,
                    "modified": "2023-08-17T20:34:51Z"
                },
                {
                    "Type": "totimestandard",
                    "group_id": "ids",
                    "type_id": "out",
                    "public_ip": "20.11.00.16",
                    "version": "55",
                    "platform": "zero",
                    "browse": "unix",
                    "model": "mac",
                    "time": 1673900091319,
                    "modified": "2023-11-17T20:34:51Z"
                }
            ]
        }
    ]
}
```}
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 33,071 Reputation points Volunteer Moderator
    2024-02-28T14:27:41.05+00:00

    I always recommned understanding the structure of your JSON data. In your case, the JSON structure is nested with arrays (input, persons, role, columns, etc.). Each of these arrays contains objects that need to be mapped to different SQL tables. If ADF introduces a Parse activity in the future, it would simplify the extraction of nested JSON structures. This activity would allow you to specify the schema of your JSON and automatically parse the nested structures into a flat format suitable for SQL tables. So what to do ? For each data flow start with a Source activity where you specify the JSON file as the input. Use the Flatten activity to turn nested arrays into a format that can be iterated over or directly mapped to SQL columns. In your settings, specify the array field you wish to flatten (e.g., $.dataflow.input[*].persons for the persons case). After flattening, use the Filter activity to include only the necessary objects based on your criteria. The Derived Column activity can be used to transform or add new columns needed for your SQL table schema. In the Sink activity, specify the Azure SQL table where the data should be inserted. You might need to create a unique dataset for each type of data flow to ensure the schema matches the SQL table's structure. To deal with the dataset schema limitation, consider creating a parameterized dataset in ADF that can dynamically adjust the schema based on the input data flow. You can pass the schema as a parameter from the Data Flow activities to the dataset. This approach requires a more advanced setup but can be very flexible.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.