Pipeline Copy Activity - JSON Source With Dynamic Schema

28026517 20 Reputation points
2023-07-10T19:58:59.51+00:00

Greetings,

Looking for recommendations on how you would solve the following problem in an Azure Synapse Analytics Pipeline.

I have a child pipeline that looks like the following. This pipeline fetches a series of "groups" from the source system, iterates over that group via a For Each and calls a Copy activity. The source of the Copy activity is a rest component that returns JSON content. The API endpoint in question returns 50 records per request, which is not configurable other than the ability to specify an offset argument to iterate through the paginated results and this is where I am experiencing issues.

Depending on how many records are in the response, the JSON will either return a singular JSON object or an array of objects. The Mapping on the copy activity is defined below, and as you will see, is expecting an array of objects. When a request produces a response with a single JSON object, the inner copy activity of the For Each fails, therefore, failing that step.

How does one handle dynamic JSON results where the format of the schema changes depending on the number of results in a response?

Pipeline: User's image

Pagination Documentation:

User's image

Mapping:

User's image

Sample JSON:

User's image

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
{count} votes

Accepted answer
  1. QuantumCache 20,366 Reputation points Moderator
    2023-07-10T21:53:44.5033333+00:00

    Hello @28026517

    Handling dynamic JSON results with varying schema based on the number of results can be challenging.

    In Mapping Data Flows, Did you try the "Derived Column" transformation to create a new column that contains the dynamic schema.

    I have tested this scenario by loading the input source JSON node (i.e Status) into a Derived column (In DataFlow)

    User's image

    I have used the Derived Column to read the json node called 'Status' as a new column.
    User's image

    Below is the output for 2 scenarios. Later i will make use of these in further processing!

    User's image

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.