Flatten JSON with same name nodes in serverless

Ásgeir Gunnarsson 61 Reputation points
2022-07-13T12:11:27.857+00:00

Hi all clever people out there.

Assume I have a JSON file with the following structure

{  
    "Node1": [  
        {  
            "id": "xxxxxxxxxx",  
            "name": "The name",  
            "Sub-node1": [  
                {  
                    "Type": "The type",  
                    "id": "xxxxxxxxxxxxxx",  
                    "users": [  
                        {  
                            "UserType": "The main person",  
				    "UserRights": "Owner",  
                            "UPN": "xxxxxx@yyyyyy.com"  
                        }  
                    ]  
                },  
                {  
                    "Type": "The type",  
                    "id": "yyyyyyyyyyyyyyy",  
                    "users": [  
                        {  
                            "UserType": "The main main person",  
				    "UserRights": "Admin",  
                            "UPN": "yyyyyyyy@yyyyyy.com"  
                        }  
                    ]  
                }  
            ],  
            "Sub-node2": [  
                {  
                    "Type": "The type",  
                    "id": "xxxyxxxxxyxxxxxx",  
                    "users": [  
                        {  
                            "UserType": "The main person",  
				    "UserRights": "Admin",  
                            "UPN": "xxxxxx@yyyyyy.com"  
                        }  
                    ]  
                },  
                {  
                    "Type": "The type",  
                    "id": "xxxxxxxxxxxyxxyx",  
                    "users": [  
                        {  
                            "UserType": "The main main person",  
				    "UserRights": "Viewer",  
                            "UPN": "yyyyyyyy@yyyyyy.com"  
                        }  
                    ]  
                }  
            ],  
            "users": [  
                {  
                    "UserType": "Superman",  
                    "UserRights": "Admin",  
                    "UPN": "yyyyyyyy@yyyyyy.com"  
                }  
            ]  
        }  
}  

Notice that there is a users node directly under the main node (Node1) and also in each sub-node. Normally I would flatten the JSON in Serverless using CROSS APPLY. Something like this:

SELECT   
Node1Id, Node1_Name, Sub-node1Type, Sub-node1_id, Sub-node2Type, Sub-node2_id, UserType, UserRights, UPN  
FROM  
    OPENROWSET(  
        BULK 'https://xxxxxxxx.dfs.core.windows.net/root/yyyyyyyy/zzzzzzzz/*/*/*/*/*',  
        FORMAT = 'CSV',  
        FIELDQUOTE = '0x0b',  
        FIELDTERMINATOR ='0x0b'  
    )  
    WITH (  
        jsonContent varchar(MAX)  
    ) AS [rows]  
CROSS APPLY openjson (jsonContent)  
    WITH (  
        Node1 NVARCHAR(MAX) AS JSON  
    )  
    CROSS APPLY openjson (Node1)  
    WITH (  
        Node1Id VARCHAR(50) '$.id',  
        Node1_Name VARCHAR(50) '$.name',  
        Sub-node1 NVARCHAR(MAX) AS JSON,  
        Sub-node2 NVARCHAR(MAX) AS JSON,  
        users NVARCHAR(MAX) AS JSON  
    )  
     
CROSS APPLY openjson (Sub-node1)  
    WITH (  
        Sub-node1Type varchar(255) '$.Type',  
        Sub-node1_id varchar(255) '$.id',  
        users NVARCHAR(MAX) AS JSON  
    )  
  
CROSS APPLY openjson (Sub-node2)  
    WITH (  
        Sub-node2Type varchar(255) '$.Type',  
        Sub-node2_id varchar(255) '$.id',  
        users NVARCHAR(MAX) AS JSON  
    )  
  
CROSS APPLY openjson (users)  
    WITH (  
        UserType varchar(50) '$.UserType',  
        UserRights NVARCHAR(100) '$.UserRights',  
        UPN varchar(100) '$.UPN',  
    )  
  
)  

With this method I can only get one for the users nodes out as you cannot CROSS APPLY the same named node multiple times. I cannot seem to figure out a way to name the node differently when I reference it as JSON in the sub-node CROSS APPLY and I cannot figure out a way to reference users as using some kind of path ('$.sub-node1.users' was my initial thought).

Does anyone know if it´s possible to get all the different User nodes from the main node and all the sub-nodes in a single SELECT (I have done it in multiple selects with one sub-node at a time)?

Would love to hear everyone's input.

Ásgeir

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,148 questions
0 comments No comments
{count} votes

Accepted answer
  1. Jovan Popovic 86 Reputation points
    2022-08-12T09:20:24.96+00:00

    Instead of multiple CROSS APPLY that will cross-join all nested sub-arrays, try tu use a single top-level cross apply with UNION ALL that will concatenate nested sub-arrays - something like:

    SELECT *  
    FROM  
         OPENROWSET(  
             BULK 'https://xxxxxxxx.dfs.core.windows.net/root/yyyyyyyy/zzzzzzzz/*/*/*/*/*',  
             FORMAT = 'CSV',  
             FIELDQUOTE = '0x0b',  
             FIELDTERMINATOR ='0x0b'  
         )  
         WITH (  
             value varchar(MAX)  
         ) AS [rows]  
    CROSS APPLY (  
    
                SELECT [Sub-node1.users].* FROM OPENJSON(value, '$."Sub-node1"') [Sub-node1]  
                    CROSS APPLY OPENJSON([Sub-node1].value, '$.users')  
                    WITH(UserType NVARCHAR(100), UserRights NVARCHAR(100),UPN NVARCHAR(100))  
                    as [Sub-node1.users]  
    
                UNION ALL  
    
                SELECT [Sub-node2.users].* FROM OPENJSON(value, '$."Sub-node2"') AS [Sub-node2]  
                    CROSS APPLY OPENJSON([Sub-node2].value, '$.users')  
                    WITH(UserType NVARCHAR(100), UserRights NVARCHAR(100),UPN NVARCHAR(100))  
                    as [Sub-node2.users]  
    
                UNION ALL  
    
                SELECT users.* FROM OPENJSON(value, '$."users"')  
                    WITH(UserType NVARCHAR(100), UserRights NVARCHAR(100),UPN NVARCHAR(100))  
                    as[users]  
    ) as users  
    

    230680-image.png


2 additional answers

Sort by: Most helpful
  1. AnnuKumari-MSFT 29,601 Reputation points Microsoft Employee
    2022-07-15T13:28:21.327+00:00

    Hi @Ásgeir Gunnarsson ,
    Thankyou for using Microsoft Q&A platform and thanks for posting your question.
    As I understand your query, you are trying to flatten the JSON data. To achieve the same , you are using Synapse notebook to write the query. Please let me know if that's not the case.

    You can consider using Flatten transformation in Mapping Dataflow to achieve the requirement.

    To know more about the implementation , kindly watch : https://www.youtube.com/watch?v=zrjYg2_2Y9I


  2. AnnuKumari-MSFT 29,601 Reputation points Microsoft Employee
    2022-07-19T09:58:46.217+00:00

    Hi @Ásgeir Gunnarsson ,

    I understand you are trying to flatten the JSON data using T-SQL script. However, As I can see the data is having nested arrays which is difficult to be flattened using T-SQL. You can use Dataflow which is UI based Data transformation tool in ADF and Synapse capable of flattening multi-level nested JSON , here is how you can achieve the above requirement using mapping dataFlow.

    1. Create a dataset pointing to the JSON file in your storage account, use the same in source transformation.

    222244-image.png

    • Check the data preview tab
      222170-image.png 2. Use Flatten transformation to unroll the 'Node1' array.
      222158-image.png
    • Check the data preview tab

    222177-image.png

    3. Use another Flatten transformation to unroll the nested array 'SubNode'
    222261-image.png

    • Check the data preview tab
      222159-image.png

    4. Use another Flatten transformation to unroll the nested array 'Users'
    222252-image.png

    • Check the data preview tab
      222262-image.png

    5. Add a sink transformation and point the dataset to the target datastore and load it by running the pipeline after calling dataflow activity.

    Hope this will help. Please let us know if any further queries.


    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators