Cannot get subarray in a array when using full fidelity schema in Synapse

Wenwei Nie 6 Reputation points
2022-06-11T06:17:42.3+00:00

I using Synapse to get data from cosmos using full fidelity schema, the data is nested with many subarrays in array, when I use OUTER APPLY OPENJSON to flat data, it just could get array in a object, but cannot get subarray in an array.
It works if I do not use full fidelity, I mean it could get any subarrays even thought it nested many layer, but if I use full fidelity, it did not work, but the data structure is very complicated and I have to use full fidelity.
So, my question is how could I get subarray in nested data using full fidelity schema?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,172 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,750 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Wenwei Nie 6 Reputation points
    2022-06-14T04:55:13.35+00:00

    Hi MartinJaffer-MSFT, thank you for reply.
    I did not get erros, just data missing.
    For instance, I have a document named "company-configurations" in cosmos like below

    {
    "workFlowId": "326abc9e-ba3a-4f6a-ac9e-397e59d73e63",
    "name": "Generalized Workflow",

    "questionSets": [  
        {  
            "id": "6ceef824-7d34-4652-861a-481cb1ff8ff4",  
              
            "questionSetName": "Loan Question Set",  
            "questions": [  
                {  
                    "id": "ef8a2fcb-7237-4b83-9486-da80a8c4bd00",  
                    "question": [  
                        {  
                            "languageId": "en",  
                            "languageName": "English",  
                            "isDefault": true,  
                            "question": "What is your email?"  
                        },  
                        {  
                            "languageId": "hr",  
                            "languageName": "Croatian",  
                            "isDefault": false,  
                            "question": "Koji je tvoj email?"  
                        },  
                        {  
                            "languageId": "cs",  
                            "languageName": "Czech",  
                            "isDefault": false,  
                            "question": "Jaký je tvůj email?"  
                        }  
                    ],  
                    "shortName": "Email",  
                    "type": "EMAIL",  
                    "typeSetting": null  
                }  
                
                  
            ]  
              
        }  
    ]  
    

    }
    As you can see it has an array "questionSets", and it has a subarray "questions", it has an subarray "question"
    Then I use SQL script like below to get data

    SELECT TOP 100 * FROM OPENROWSET(​PROVIDER = 'CosmosDB',
    CONNECTION = 'Account=cosmos-global-syn;Database=q',
    OBJECT = 'company-configurations',
    SERVER_CREDENTIAL = 'cosmos-global')
    WITH (
    workFlowId VARCHAR(256) COLLATE Latin1_General_100_BIN2_UTF8 '$.workFlowId.string',
    questionSets VARCHAR(MAX) '$.questionSets.array',
    type VARCHAR(256) COLLATE Latin1_General_100_BIN2_UTF8 '$.type.string'

    ) AS [company-configurations]   
    OUTER APPLY OPENJSON ( [company-configurations].questionSets )  
    	 WITH (  
    		id VARCHAR(256) COLLATE Latin1_General_100_BIN2_UTF8,  
    		questions NVARCHAR(MAX) AS JSON 		  
    	) AS [company-configurations_questionSets]  
    	OUTER APPLY OPENJSON ( [company-configurations_questionSets].questions )  
    		 WITH (  
    			id VARCHAR(256) COLLATE Latin1_General_100_BIN2_UTF8,  
    			question NVARCHAR(MAX) AS JSON   
    		) AS [company-configurations_questionSets_questions]  
    		OUTER APPLY OPENJSON ( [company-configurations_questionSets_questions].question )  
    			 WITH (  
    				question VARCHAR(256) COLLATE Latin1_General_100_BIN2_UTF8				  
    			) AS [company-configurations_questionSets_questions_question]  
    

    But in the column "questionSets", I can only get something like below

    [{"object":{"id":{"string":"da87f905-2cbf-489e-86af-50bdfa85a27f"},}},"questionSetName":{"string":"Loan Question Set"}}}]

    it could not get the data in subarray, it even not show the name of "questions".

    It happens in full fidelity schema. If I use Well-defined schema, I can get the subarray. but I have to use full fidelity schema, because I am keep changing the data structure in cosmos.

    Is there any way to make it works with full fidelity schema?

    Thanks


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.