Parse bigJson (nested Json) files in Azure Data Lake Gen 2 and saving data into separate normalized tables in Azure SQL

Bose, Dibyendu (Cognizant) 1 Reputation point
2022-05-09T15:02:46.323+00:00

We have BigJson files (complex nested Json files 10 levels deep) stored in Azure Data Lake gen 2. We need to read the Json files, parse them and store the data into separate tables in Azure SQL DB.
So, it's NOT like parsing the Json into one single flattened denormalized table in SQL DW.
Can we do this by Azure Data Factory? What is the best way to achieve this?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,236 Reputation points
    2022-05-10T18:29:47.41+00:00

    Hello @Anonymous and welcome to Microsoft Q&A.

    If I understand you right, you want to not only flatten a very deeply nested JSON, but you want to split it into multiple tables. This is beyond the ability of Copy Activity, but I think Data Flow can accomplish this. This task sounds non-trivial, especially if the nesting forks.

    In the tags, you included Data Factory, Synapse, and Databricks. Can I take this to mean you are comfortable using a code-based solution , not just a visual interface? If that is so, more options become available, such as writing custom code to do the work in Synapse and Databricks notebooks / jars. There is also custom code in the Custom Activity.

    If all else fails there is using the SQL JSON functions.

    Whether do go Data Flow or custom code depends upon volume and complexity.

    As I don't know the details of your JSON structure, nor the criteria or desired output, I can't provide more precise advice.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.