Reading API response with nested JSON in copy activity

Gayatri K 5 Reputation points
2023-08-22T04:10:57.01+00:00

Hi
I am using copy activity to read REST API response and store it in data lake. JSON response is a nested JSON and I can't seem to flatten it by specifying the mappings. I need to read results array of GroupCalculatedNav and get them into rows.

Example response JSON

[
  {
    "d": {
      "results": [
        {
          "GroupCalculatedNav": {
            "results": [
              {
                "seqNumber": "1",
                "userId": "abc",
                "startDate": "2023-07-01T00:00:00Z",
                "ComponentGroupId": "A"
              },
              {
                "seqNumber": "1",
                "userId": "abc",
                "startDate": "2023-07-01T00:00:00Z",
                "payComponentGroupId": "B"
              }
            ]
          }
        },
        {
          "GroupCalculatedNav": {
            "results": [
              {
                "seqNumber": "1",
                "userId": "xyz",
                "startDate": "2023-07-01T00:00:00Z",
                "payComponentGroupId": "A"
              },
              {
                "seqNumber": "1",
                "userId": "xyz",
                "startDate": "2023-07-01T00:00:00Z",
                "payComponentGroupId": "B"
              }
            ]
          }
        }
      ]
    }
  }
]

Is this achievable with a copy activity at all?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} vote

1 answer

Sort by: Most helpful
  1. RevelinoB 3,675 Reputation points
    2023-08-22T04:24:53.5833333+00:00

    Hi Gayatri,

    Flattening a nested JSON structure like the one you've provided using Azure Data Factory's Copy Activity can be challenging, especially when dealing with multiple levels of nesting. While the Copy Activity does offer some transformation capabilities, it might not be suitable for complex JSON transformations like the one you're trying to achieve.

    However, you can still accomplish this task using Azure Data Factory's Data Flow feature. Data Flows provide more advanced transformation capabilities compared to Copy Activities, including the ability to handle complex JSON structures.

    Here's a high-level outline of how you could approach this using Data Flows:

    • Source: Use a JSON source to read the data from the REST API response.
    • Derived Column Transformation: Add a Derived Column transformation to extract the "results" array from the "GroupCalculatedNav" objects. This could involve using an expression to access the nested array.
    • Explode Transformation: Use the Explode transformation to explode the extracted "results" array into individual rows. This will essentially create multiple rows for each item in the array.
    • Flatten and Transform: After exploding the array, you can use additional Derived Column transformations to extract specific attributes from the exploded items and flatten the structure as needed.
    • Sink: Finally, use a Sink to write the transformed and flattened data into your data lake. The above steps outline a general approach, but the specifics of the expressions and transformations would depend on your exact JSON structure and the desired output format.

    If you find that the transformation logic becomes too complex within Azure Data Factory's Data Flow, you might consider using a more specialized ETL (Extract, Transform, Load) tool or writing a custom script to handle the transformation outside of Azure Data Factory.

    Remember that JSON data transformations can sometimes be complex, and the best approach depends on the complexity of your data and your specific requirements. It's a good idea to test your data flow with a representative sample before applying it to your entire dataset. Hope this helps with your query?


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.