How to Flatten an Array JSON Structure in ADF

Question

How to Flatten an Array JSON Structure in ADF

Jancy John 86

Hi All,

I'm trying to flatten the following JSON structure in Azure Data Factory so that I can get the data from ‘rows []’ in a tabular format to store in SQL table. The JSON response has columns [] and rows [], where columns [] hold the field names and rows [] hold the corresponding data for these fields. I want the data from columns [].name as the column names and data from rows [] as the rows in the result.
I have tried using an ADF Data Flow & the flatten transformation but couldn’t find a method for the desired result. New to Azure ADF and JSON so any help or suggestions would be much appreciated.

Thanks in advance for your help!

The sample JSON File Structure is below:

{
"id": "XXXX",
"name": "XYZ",
"type": "cost",
"location": null,
"sku": null,
"eTag": null,
"properties": {
"nextLink": null,
"columns": [
{
"name": "Amount",
"type": "Number"
},
{
"name": "Date",
"type": "Number"
},
{
"name": "Group",
"type": "String"
},
{
"name": "Type",
"type": "String"
},
{
"name": "Location",
"type": "String"
},
{
"name": "Currency",
"type": "String"
}
],
"rows": [
[
0.0012,
20220901,
"RAM",
"storage",
"EU",
"GBP"
],
]
}
}

The desired result is:
Amount Date Group Type Location Currency
0. 0012 20220901 RAM storage EU GBP

Accepted answer

1 additional answer

Your answer

Answer 1

Hi @Anonymous ,

Welcome to Microsoft Q&A platform and thanks for posting your question here.
As I understand your question, you are trying to copy JSON into SQL table by flattening the complex data.

First of all, row[] has a single nested array within , however at the end of it, there seems to be ',' which is not required here. For processing it , I am removing one level of array . Instead of row[[]] , I am providing row[] . You can follow the same steps on the original source data:

Add two source transformations pointing to the same source json. Select 'Array of documents' in the json settings for both the sources.
Add flatten transformations to each of the sources . For one flatten transformation, select rows[] to unroll by and for the other one, select columns[] to unroll by.

Add surrogate Key transformation to each of them to create a new incremental column called rownum
Add Join transformation to join the two sources based on Id and RowNum
Use Select transformation to remove unwanted columns after the join
Use Pivot transformation and group by using id column . In the Pivot key tab, use name column and in the pivoted columns , provide aggregate function like min(rows) or max(rows) or avg(rows).
Go to the Data preview tab of Pivot transformation and select 'Map drifted' option. It will add another derived column automatically . Select 'Id' column and remove it and sort other columns as per the need.

Add SQL dataset in sink transformation to load the data to SQL table.

Hope this will help. Please let us know if any further queries.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you.
Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Answer 2

Hi @AnnuKumari-MSFT

Thank you so much for your valuable time in resolving my query. It really helped and I am now able to replicate the same.

However, I still need your help with one more query I have in this. This is definitely my mistake while drafting the sample JSON response.

row [] is actually a nested array with more than one group of items. I have around 140 arrays inside row [] in the original JSON response. I was just giving a sample JSON due to the confidentiality of data and forgot to mention that row [] is a nested array row [[]].

I have replicated your solution in ADF and added one more flatten transformation (after flatten1) and flattened the nested row and got the first row of data in output. However, I would like to have multiple rows, as per the real JSON. Could you please help me with the necessary changes to be made in surrogatekey and join transformations.

Thank you very much for your kind help on this.

The sample JSON File Structure is below:
{
"id": "XXXX",
"name": "XYZ",
"type": "cost",
"location": null,
"sku": null,
"eTag": null,
"properties": {
"nextLink": null,
"columns": [
{
"name": "Amount",
"type": "Number"
},
{
"name": "Date",
"type": "Number"
},
{
"name": "Group",
"type": "String"
},
{
"name": "Type",
"type": "String"
},
{
"name": "Location",
"type": "String"
},
{
"name": "Currency",
"type": "String"
}
],
"rows": [
[
0. 0012,
20220901,
"RAM",
"storage",
"EU",
"GBP"
],
[
0. 0015,
20220902,
"VM",
"Compute",
"EU",
"GBP"
],
[
0. 0017,
20220903,
"Disk",
"storage",
"EU",
"GBP"
]
]
}
}

The desired result is: So I should have 140 rows and 6 columns with real source data
Amount Date Group Type Location Currency
0. 0012 20220901 RAM storage EU GBP
0. 0015 20220902 VM Compute EU GBP
0. 0017 20220903 Disk storage EU GBP

Share via

How to Flatten an Array JSON Structure in ADF

1 additional answer

Your answer