How to split json array/object into table rows using azure data factory

Vivek Komarla Bhaskar 956 Reputation points
2024-02-27T12:45:38.0633333+00:00

Hi, I would like to split a json array/object into table rows using Azure Data Factory. It is important to note that the properties keep changing, as you can see in my screenshot below, they include (deferral, hard_bounce, no_stories_found, other, reject, retry, sent, soft_bounce, spam). Since these values change constantly, the conversion process should be real-time. See below screenshot - Screenshot 2024-02-27 at 11.58.59

Expected output - Screenshot 2024-02-27 at 12.32.53

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,625 questions
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 15,755 Reputation points Microsoft External Staff Moderator
    2024-02-27T14:13:38.98+00:00

    @Vivek Komarla Bhaskar

    Thanks for reaching out to Microsoft Q&A.

    There are two main approaches to splitting a JSON array/object into table rows using Azure Data Factory (ADF) even when the properties keep changing:

    1. Using Data Flow Activity:

    • Source: Use a source activity to read the JSON data. This could be a dataset pointing to a storage location like Azure Blob Storage, ADLS Gen2, or a data lake.
    • Select Transformation: Add a Select transformation to select the JSON array you want to split.
    • Unfold Transformation: Use the Unfold transformation to iterate through each element of the JSON array. This will create a new row for each element in the array.
    • Derived Column Transformation (Optional): You can optionally use a Derived Column transformation to create new columns based on the values within the array elements. For example, you could extract specific properties from each element and create separate columns for them.
    • Sink: Use a sink activity to write the transformed data to a destination like Azure SQL Database, Azure Synapse Analytics, or another data store.

    2. Using Script Activity:

    • Source: Similar to the Data Flow approach, use a source activity to read the JSON data.
    • Script Activity: Add a Script activity and use a language like Python or PowerShell to process the JSON data. You can use libraries like JSON parsing libraries to iterate through the JSON array and create new rows for each element. You can also use dynamic expressions to access the changing properties.
    • Sink: Use a sink activity to write the transformed data to a destination.

    Handling Changing Properties: Both methods can handle changing properties. The Data Flow approach is more visual and easier to set up, but the Script activity offers more flexibility for complex transformations and handling dynamic data. Here are some additional points to consider:

    • Error Handling: It's important to implement proper error handling in your ADF pipeline to handle any potential issues during the data transformation process.
    • Performance: The performance of each approach can vary depending on the complexity of the JSON data and the volume of data being processed. It's recommended to test and compare both methods to determine the best option for your specific scenario.

    For a more detailed walkthrough with code examples, you can refer to the following resources:

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.