Parsing Complex JSON in ADF

Sydney 21 Reputation points
2022-07-09T23:50:22.053+00:00

I have a complex JSON file. The name section has got an array as well as a single string value in the node.
How can the column be flattened?
Student members: names in array
Teachers: names as string

JSON file (source)
{
"members":
[
{
"resourceType": "Student",
"id": "1234",
"active": true,
"name":
{
"family": "Dup",
"given": [ "Jonny", "John" ]
}
,
"gender": "female",
"birthDate": "2010-10-29",
"school": { "name": "London High School" }
},
{
"resourceType": "Teacher",
"id": "1235",
"name":
{
"Title": "Dr."
"fullname": "Jonny Clinton",
}
,
"TeachingCertificate":
{
"yearObtained": "1999-01-01",
"level": "2"
}
}
]
}

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,161 Reputation points
    2022-07-12T03:12:07.487+00:00

    Hello @Sydney ,
    Thanks for the question and using MS Q&A platform.

    as I understand, you want to flatten two different data structures attributed to similar properties.

    I worked closely with Kranthi on your [initial request][1]. It was a challenge. As I recall, we split the data into two streams, one for Student and one for Teacher. Since we split into separate streams, we can then treat each differently. Thus string versus array is not the problem.

    The flattening implies you want each name in the array to have its own column. The challenge I see, is that array does not have a fixed length. We could have["Jimmy", "Jon", "Smith", "Wesson", "The third"] or we could have ["Rodrigo","Cortez"].

    Having a changing quantity of columns becomes impossible to work with.
    However if you wanted to merge the items in an array together into a single string, we solve the issue of name length.

    This merge was accomplished in the original ask by:

    For name you can use this expression: reduce(name.given, '', #acc + #item + ' ', #result + name.family)

    Reduce is a function used to collect all the items in an array and render them into a single result. Usually paired with the map function, and is the origin of the term "map-reduce".
    name.given is the array of names. To each element in turn we apply concatenate( everything_so_far , the_next_item)
    Then after all items have been done, we do concatenate( the_result, name.family)`

    Does this make sense? Is it agreeable? Do you have any suggestions or questions I can clarify or find alternate solution for?

    0 comments No comments

  2. Sydney 21 Reputation points
    2022-07-12T11:45:19.663+00:00

    Thanks again. I found that ADF is inconsistent.
    Now both are shown as array even though one is a string value, the other is an array format. The ADF is becoming smarter and smarter now. However I used reduce function to get given names and the teacher's name looks ok but the student's given name is in [ ]. What could cause this?

    219820-image.png

    219797-image.png

    219893-image.png
    219840-image.png

    Both have the same expression to get the given name(s).
    219880-image.png


  3. Sydney 21 Reputation points
    2022-07-16T07:26:33.347+00:00

    The below Json is valid but I cannot flatten the name section because one is a string and the other is an array. The teacher's given name is correct. But the student given name is returned as an JSON array. ADF treated [ "Susan2", "Susan3" ] as one string value. Does it need a parse transforming from a string to JSON array?
    {
    "members":
    [
    {
    "resourceType": "Student",
    "id": "1234",
    "active": true,
    "name":
    {
    "family": "Dup",
    "given": [ "Susan2", "Susan3" ]
    }
    ,
    "gender": "female",
    "birthDate": "2010-10-29",
    "school": { "name": "London High School" }
    },
    {
    "resourceType": "Teacher",
    "id": "1235",
    "name":
    {
    "Title": "Dr.",
    "family": "Clinton",
    "given": "Jonny"
    }
    ,
    "TeachingCertificate":
    {
    "yearObtained": "1999-01-01",
    "level": "2"
    }
    }
    ]
    }

    221313-image.png

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.