Dynamic parsing of json in BulkColumn using SQL Server TSQL

Raj D 616 Reputation points
2023-03-13T20:45:29.8+00:00

OPENJSON in SQL Server using with statement dynamically on json which has a mix of string object and array values. I am able to parse json dynamically as long as JSON data has consistent rows has no complex objects or array such as.

I'm able to parse this below json dynamically without writing the SQL query like below.

JSON:

{
    "Data": [
{ "name": "ABC", "date": "2020-01-20", "projectid": "1234" },
{ "name": "DEF", "date": "2020-01-30", "projectid": "4567" }
]
}

SQL:

DECLARE @JSON VARCHAR(MAX)

SELECT @JSON = BulkColumn
FROM OPENROWSET (BULK 'c:/data/project.json', SINGLE_CLOB) X

SELECT *
FROM OPENJSON (@JSON, '$.Data') WITH (
   [name] NVARCHAR(256),
   [date] DATE,
   [projectid] INT 
)

We receive data from a third-party vendor who sends similar information to different clients and they say its hard for them to keep track of each individual client's metadata and schema to keep them informed. That's the reason why we end up having to parse this unstructured json. Some projects have modifications and based on what the modifications are more fields are added to that project. I'm now exploring options to parse json in [BulkColumn] dynamically in Azure SQL database Server or SQL Server 2019 for the below json that has no fixed schema.

I tried the below SQL query but does not work with unstructured json.

SQL:

	SET NOCOUNT ON 
    IF OBJECT_ID ('tempdb..##tmpjson') IS NOT NULL DROP TABLE ##tmpjson

    DECLARE @tvjson TABLE (Id INT IDENTITY(1,1) , JsonCol NVARCHAR(MAX))

    INSERT INTO @tvjson (JsonCol)
    VALUES ('{
    "Data": [        
        {
            "name": "ABC",
            "date": "2020-01-20",
            "subproject": [
                {
                    "id": "123",
                    "projectname": "new1",
                    "refnum": "123:new1"
                },
                {
                    "id": "456",
                    "projectname": "new2",
                    "refnum": "456:new2"
                }
            ],
            "projectid": "1234",
            "projectdate": "2020-01-27"
        },
        {
            "name": "DEF",
            "date": "2020-01-30",
            "subproject": [
                {
                    "id": "789",
                    "projectname": "new3",
                    "refnum": "789:new3"
                },
                {
                    "id": "901",
                    "projectname": "new4",
                    "refnum": "901:new4"
                }
            ],
            "projectid": "4567",
            "projectdate": "2020-02-07"
        },
        {
            "name": "HGI",
            "date": "2019-03-31",
            "subproject": [
                {
                    "id": "327",
                    "projectname": "new8",
                    "refnum": "327:new8"
                },
                {
                    "id": "203",
                    "projectname": "new24",
                    "refnum": "203:new24",
					"modifieddate": "2022-03-07"
                }
            ],
            "projectid": "9281",
            "projectdate": "2020-04-10",
            "projectmodifieddate": "2022-03-07"
        },
        {
            "name": "IKL",
            "date": "2019-05-30",
            "subproject": [
                {
                    "id": "308",
                    "projectname": "new28",
                    "refnum": "308:new28"
                },
                {
                    "id": "113",
                    "projectname": "new31",
                    "refnum": "113:new31",
					"modifieddate": "2022-03-07"
                }
            ],
            "projectid": "7324",
            "projectdate": "2020-05-31",
            "projectmodifieddate": "2022-03-07",
			"projectdescription": "This is a test project"
        }
    ]
}')

    ;WITH ctejson AS 
    (
    SELECT * 
      FROM @tvjson
     CROSS APPLY OPENJSON (JsonCol, '$.Data') j
    )

    ,jsonData AS 
    (
    SELECT Id, [key], [value]
      FROM ctejson 
     WHERE [type] != 5
     UNION ALL
    SELECT b.Id, b.[Key] +'_'+ t.[key] [key], t.[value]
      FROM ctejson  b
     CROSS APPLY OPENJSON ([value]) t
     WHERE b.[type] = 5 
    )

    SELECT *
      INTO ##tmpjson 
      FROM jsonData

    DECLARE @cols NVARCHAR(MAX)=''
    DECLARE @pivot NVARCHAR(MAX)

    SELECT @cols +=','+ QUOTENAME([key])
      FROM ##tmpjson
     GROUP BY [key]

    SET @cols = STUFF(@cols,1,1,'')

    SET @pivot = 'SELECT * 
				    FROM ##tmpjson 
				   PIVOT (MAX([value]) FOR [key] IN ('+@cols+')) j'

    EXEC sp_executesql @pivot 
    DROP TABLE ##tmpjson

Output:

name date subproject_id subproject_projectname subproject_refnum subproject_modifieddate projectid projectdate projectmodifieddate projectdescription
ABC 1/20/2020 123 new1 123:new1 NULL 1234 1/27/2020 NULL NULL
ABC 1/20/2020 456 new2 456:new2 NULL 1234 1/27/2020 NULL NULL
DEF 1/30/2020 789 new3 789:new3 NULL 4567 2/7/2020 NULL NULL
DEF 1/30/2020 901 new4 901:new4 NULL 4567 2/7/2020 NULL NULL
HGI 3/31/2019 327 new8 327:new8 NULL 9281 4/10/2020 3/7/2022 NULL
HGI 3/31/2019 203 new24 203:new24 3/7/2022 9281 4/10/2020 3/7/2022 NULL
IKL 5/30/2019 308 new28 308:new28 NULL 7324 5/31/2020 NULL NULL
IKL 5/30/2019 113 new31 113:new31 3/7/2022 7324 5/31/2020 3/7/2022 This is a test project

I am looking for a generic query that would handle structured and unstructured json documents.

JSON:

{
    "Data": [        
        {
            "name": "ABC",
            "date": "2020-01-20",
            "subproject": [
                {
                    "id": "123",
                    "projectname": "new1",
                    "refnum": "123:new1"
                },
                {
                    "id": "456",
                    "projectname": "new2",
                    "refnum": "456:new2"
                }
            ],
            "projectid": "1234",
            "projectdate": "2020-01-27"
        },
        {
            "name": "DEF",
            "date": "2020-01-30",
            "subproject": [
                {
                    "id": "789",
                    "projectname": "new3",
                    "refnum": "789:new3"
                },
                {
                    "id": "901",
                    "projectname": "new4",
                    "refnum": "901:new4"
                }
            ],
            "projectid": "4567",
            "projectdate": "2020-02-07"
        },
        {
            "name": "HGI",
            "date": "2019-03-31",
            "subproject": [
                {
                    "id": "327",
                    "projectname": "new8",
                    "refnum": "327:new8"
                },
                {
                    "id": "203",
                    "projectname": "new24",
                    "refnum": "203:new24",
					"modifieddate": "2022-03-07"
                }
            ],
            "projectid": "9281",
            "projectdate": "2020-04-10",
            "projectmodifieddate": "2022-03-07"
        },
        {
            "name": "IKL",
            "date": "2019-05-30",
            "subproject": [
                {
                    "id": "308",
                    "projectname": "new28",
                    "refnum": "308:new28"
                },
                {
                    "id": "113",
                    "projectname": "new31",
                    "refnum": "113:new31",
					"modifieddate": "2022-03-07"
                }
            ],
            "projectid": "7324",
            "projectdate": "2020-05-31",
            "projectmodifieddate": "2022-03-07",
			"projectdescription": "This is a test project"
        }
    ]
}
SQL Server | Other
{count} votes

2 answers

Sort by: Most helpful
  1. LiHongMSFT-4306 31,616 Reputation points
    2023-03-15T02:45:16+00:00

    Hi @Raj D

    For JSON string with Nested array of elements, you could use JSON_Value() together with OPENJSON() function. Like this:

    DECLARE @json NVARCHAR(max)
    SELECT @json = 
    N'{
        "Data": [        
            {
                "name": "ABC",
                "date": "2020-01-20",
                "subproject": [
                    {
                        "id": "123",
                        "projectname": "new1",
                        "refnum": "123:new1"
                    },
                    {
                        "id": "456",
                        "projectname": "new2",
                        "refnum": "456:new2"
                    }
                ],
                "projectid": "1234",
                "projectdate": "2020-01-27"
            },
            {
                "name": "DEF",
                "date": "2020-01-30",
                "subproject": [
                    {
                        "id": "789",
                        "projectname": "new3",
                        "refnum": "789:new3"
                    },
                    {
                        "id": "901",
                        "projectname": "new4",
                        "refnum": "901:new4"
                    }
                ],
                "projectid": "4567",
                "projectdate": "2020-02-07"
            },
            {
                "name": "HGI",
                "date": "2019-03-31",
                "subproject": [
                    {
                        "id": "327",
                        "projectname": "new8",
                        "refnum": "327:new8"
                    },
                    {
                        "id": "203",
                        "projectname": "new24",
                        "refnum": "203:new24",
    					"modifieddate": "2022-03-07"
                    }
                ],
                "projectid": "9281",
                "projectdate": "2020-04-10",
                "projectmodifieddate": "2022-03-07"
            },
            {
                "name": "IKL",
                "date": "2019-05-30",
                "subproject": [
                    {
                        "id": "308",
                        "projectname": "new28",
                        "refnum": "308:new28"
                    },
                    {
                        "id": "113",
                        "projectname": "new31",
                        "refnum": "113:new31",
    					"modifieddate": "2022-03-07"
                    }
                ],
                "projectid": "7324",
                "projectdate": "2020-05-31",
                "projectmodifieddate": "2022-03-07",
    			"projectdescription": "This is a test project"
            }
        ]
    }'
      
    SELECT
        JSON_Value (p.value, '$.name') as [name], 
        JSON_Value (p.value, '$.date') as [date], 
        JSON_Value (s.value, '$.id') as subproject_id, 
        JSON_Value (s.value, '$.projectname') as subproject_projectname,
    	JSON_Value (s.value, '$.refnum') as subproject_refnum,
    	JSON_Value (s.value, '$.modifieddate') as subproject_modifieddate,
    	JSON_Value (p.value, '$.projectid') as projectid,
    	JSON_Value (p.value, '$.projectdate') as projectdate,
    	JSON_Value (p.value, '$.projectmodifieddate') as projectmodifieddate,
    	JSON_Value (p.value, '$.projectdescription') as projectdescription
    FROM OPENJSON (@json, '$.Data') as p
    CROSS APPLY OPENJSON (p.value, '$.subproject') as s
    

    Refer to this article for more details and methods: Reading JSON string with Nested array of elements

    Best regards,

    Cosmog Hong


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    1 person found this answer helpful.

  2. Erland Sommarskog 129.4K Reputation points MVP Volunteer Moderator
    2023-03-15T18:17:25.73+00:00

    Let's take a step back and look around where you are. You are in a relational database, and in a relational database a query is supposed to return a table, with a fixed set of columns. Your request is the antithesis of this. If the document has the fields Raspberry, Strawberry and Gooseberry, you want these three columns. If the document has the fields Onion, Horseradish and Pepper you want these columns. At least, that is as I understand it.

    And top of all, could the JSON have just any shape? Or is that some pattern you can assume? I ask, because in the former case, you are looking at a very big challenge.

    And, it the end, how do you intend to consume the result set? Producing a result set that can have any shape also presents a challenge for the consumer of the data.

    In the end, I am not sure that SQL Server is the best venue for this. It may be better to read the document in a .NET client and parse with help of JSON classes.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.