I’m trying to understand how the REST source works in a data factory Data Flow. I will start with a simple case.
This is a sample response of my API (https://reqres.in/api/users):
{
"page": 2,
"per_page": 6,
"total": 12,
"total_pages": 2,
"data": [
{
"id": 7,
"email": "michael.lawson@reqres.in",
"first_name": "Michael",
"last_name": "Lawson",
"avatar": "https://reqres.in/img/faces/7-image.jpg"
},
{
"id": 8,
"email": "lindsay.ferguson@reqres.in",
"first_name": "Lindsay",
"last_name": "Ferguson",
"avatar": "https://reqres.in/img/faces/8-image.jpg"
},
],
"support": {
"url": "https://reqres.in/#support-heading",
"text": "To keep ReqRes free, contributions towards server costs are appreciated!"
}
}
As you can see, the response payload is a Json Object with a data
array containing the actual data content.
I need to unroll the data
field and save in a csv file using a Data Flow. That's pretty straightforward:
- Source Projection is defining a field named
data
of type complex array
inside the body
column:
- Let's now unroll by
data
field and use an explicit mapping:
- Just make data flow into the Sink with the auto-map feature:
Now, let's change our source, with a slightly different response (https://jsonplaceholder.typicode.com/posts):
[
{
"userId": 1,
"id": 1,
"title": "sunt aut facere repellat",
"body": "quia et suscipit suscipit recusandae consequuntur expedita"
},
{
"userId": 1,
"id": 2,
"title": "qui est esse",
"body": "est rerum tempore vitae"
},
{
"userId": 1,
"id": 3,
"title": "ea molestias",
"body": "et iusto sed quo"
},
{
"userId": 1,
"id": 4,
"title": "eum et est occaecati",
"body": "ullam et saepe reiciendis voluptatem adipisci"
}
]
In this case, the response body is just a Json Array. This particular detail, makes it impossible to reach the same goal as before.
- The projection shows the
body
which is defined as a complex object
and not as a complex array as I would expect.
- In fact, unrolling data will be not allowed (the option is grayed out)
- Also, trying to directly write the
body
field into a csv sink will result in a validation error:
Eventually, i found a workaround where you should use the Aggregate task combined with the collect() function. But, it is just a lot of overhead.
I still think I'm missing something, but it seems that Data Factory Data Flows don't support Json Array object as a response payload from a REST source.