Understand Data Factory Data Flow REST source behaviour when the response is json array

Fabio Carello 31 Reputation points
2023-10-19T15:28:01.3633333+00:00

I’m trying to understand how the REST source works in a data factory Data Flow. I will start with a simple case. This is a sample response of my API (https://reqres.in/api/users):

{
    "page": 2,
    "per_page": 6,
    "total": 12,
    "total_pages": 2,
    "data": [
        {
            "id": 7,
            "email": "michael.lawson@reqres.in",
            "first_name": "Michael",
            "last_name": "Lawson",
            "avatar": "https://reqres.in/img/faces/7-image.jpg"
        },
        {
            "id": 8,
            "email": "lindsay.ferguson@reqres.in",
            "first_name": "Lindsay",
            "last_name": "Ferguson",
            "avatar": "https://reqres.in/img/faces/8-image.jpg"
        },
    ],
    "support": {
        "url": "https://reqres.in/#support-heading",
        "text": "To keep ReqRes free, contributions towards server costs are appreciated!"
    }
}

As you can see, the response payload is a Json Object with a data array containing the actual data content.

I need to unroll the data field and save in a csv file using a Data Flow. That's pretty straightforward:

  • Source Projection is defining a field named data of type complex array inside the body column: enter image description here
  • Let's now unroll by data field and use an explicit mapping: enter image description here
  • Just make data flow into the Sink with the auto-map feature: enter image description here

Now, let's change our source, with a slightly different response (https://jsonplaceholder.typicode.com/posts):

[
  {
    "userId": 1,
    "id": 1,
    "title": "sunt aut facere repellat",
    "body": "quia et suscipit suscipit recusandae consequuntur expedita"
  },
  {
    "userId": 1,
    "id": 2,
    "title": "qui est esse",
    "body": "est rerum tempore vitae"
  },
  {
    "userId": 1,
    "id": 3,
    "title": "ea molestias",
    "body": "et iusto sed quo"
  },
  {
    "userId": 1,
    "id": 4,
    "title": "eum et est occaecati",
    "body": "ullam et saepe reiciendis voluptatem adipisci"
  }
]

In this case, the response body is just a Json Array. This particular detail, makes it impossible to reach the same goal as before.

  • The projection shows the body which is defined as a complex object and not as a complex array as I would expect. enter image description here
  • In fact, unrolling data will be not allowed (the option is grayed out) enter image description here
  • Also, trying to directly write the body field into a csv sink will result in a validation error: enter image description here

Eventually, i found a workaround where you should use the Aggregate task combined with the collect() function. But, it is just a lot of overhead.

I still think I'm missing something, but it seems that Data Factory Data Flows don't support Json Array object as a response payload from a REST source.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,751 questions
{count} votes

1 answer

Sort by: Most helpful
  1. AnnuKumari-MSFT 31,726 Reputation points Microsoft Employee
    2023-10-20T18:08:43.4366667+00:00

    Hi Fabio Carello ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your question here.

    As I understand your query , you want to flatten json data coming from REST API endpoint, however, your query is that for one of the REST API , you are not able to flatten the data since it's coming as complex datatype and not array datatype which is making it difficult to unroll by. Please correct me if my understanding is wrong.

    Your understanding is correct regarding the possible approach to use aggregate transformation with collect function which basically collects all values of the expression in the aggregated group into a array . There is no direct way to achieve this as cast transformation doesn't have array as the datatype in option.

    Other way to achieve this could be to use derived column transformation and use split function which splits a string based on a delimiter and returns an array of strings . It allows to select the body[] for unroll by in flatten transformation.

    User's image

    User's image

    User's image

    For similar scenario, kindly check this post: Flatten and Parsing Json using Azure Data Flow

    Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou