Hi @sachin gupta
Thanks for the question and using MS Q&A platform.
I saw the same question has been asked in Stack Overflow thread - https://stackoverflow.com/questions/78929093/flatten-a-json-nested-array-in-azure-synapse-pipeline one of our support engineers has already answered your question.
Answer Summary (from SO):
Currently, Dataflow doesn't have any feature to flatten all the nested arrays as per your requirement. In your case, you can try the below workaround using combination of flattens and join transformations.
For some reason, your timestamps
array is giving me only null values when the type is an integer array. So, I have set it to the string array. You need to change this array as per your requirement in the source dataset schema.
First, use flatten transformation like below to flatten the data
array and to get the inner arrays as columns. Use Rule-based mapping for this.
It will give the result like below.
Then follow the transformations as shown in the above image.
Use flatten2
transformation to flatten the dimensions
array. In flatten, give the dimension
array for both Unroll by and Unroll root options. It will flatten this array.
Then take a select2
transformation and remove the timestamps
column.
Take a new branch from flatten2
and add another select transformation select1
and here remove the values
column.
Now, add another flatten transformation flatten3
to the select2
and flatten the array column values
. For both Unroll by and Unroll root options, give values
column only.
Similarly, add flatten4
to the select1
and here do it for timestamps
array column.
After that, take SurrogateKey transformation surrogateKey1
to the flatten3
and create a key column key1
and give the step and start value as 1.
Similarly, do the same for flatten4
transformation. give the same key name as well.
Now, inner join both surrogate key transformations using join transformation based on the common column key1
.
Now, it will have required results, use another select transformation and remove the duplicate columns and extra key1
column.
Now, you will get the desired result.
Use the below dataflow script to build your dataflow.
source(output(
result as (data as (dimensions as string[], timestamps as string[], values as string[])[])[]
),
allowSchemaDrift: true,
validateSchema: false,
ignoreNoFilesFound: false,
documentForm: 'singleDocument') ~> source1
source1 foldDown(unroll(result.data, result),
mapColumn(
each(result.data,match(true()))
),
skipDuplicateMapInputs: false,
skipDuplicateMapOutputs: false) ~> flatten1
flatten1 foldDown(unroll(dimensions, dimensions),
mapColumn(
dimensions,
timestamps,
values
),
skipDuplicateMapInputs: false,
skipDuplicateMapOutputs: false) ~> flatten2
flatten2 select(mapColumn(
dimensions,
timestamps
),
skipDuplicateMapInputs: true,
skipDuplicateMapOutputs: true) ~> select1
flatten2 select(mapColumn(
dimensions,
values
),
skipDuplicateMapInputs: true,
skipDuplicateMapOutputs: true) ~> select2
select2 foldDown(unroll(values, values),
mapColumn(
dimensions,
values
),
skipDuplicateMapInputs: false,
skipDuplicateMapOutputs: false) ~> flatten3
select1 foldDown(unroll(timestamps, timestamps),
mapColumn(
dimensions,
timestamps
),
skipDuplicateMapInputs: false,
skipDuplicateMapOutputs: false) ~> flatten4
flatten3 keyGenerate(output(key1 as long),
startAt: 1L,
stepValue: 1L) ~> surrogateKey1
flatten4 keyGenerate(output(key1 as long),
startAt: 1L,
stepValue: 1L) ~> surrogateKey2
surrogateKey1, surrogateKey2 join(surrogateKey1@key1 == surrogateKey2@key1,
joinType:'inner',
matchType:'exact',
ignoreSpaces: false,
broadcast: 'auto')~> join1
join1 select(mapColumn(
dimensions = flatten3@dimensions,
values,
dimensions = flatten4@dimensions,
timestamps
),
skipDuplicateMapInputs: true,
skipDuplicateMapOutputs: true) ~> select3
select3 sink(allowSchemaDrift: true,
validateSchema: false,
input(
result as (data as (dimensions as string[], timestamps as string[], values as string[])[])[]
),
umask: 0022,
preCommands: [],
postCommands: [],
skipDuplicateMapInputs: true,
skipDuplicateMapOutputs: true) ~> sink1
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.