Parsing or flattening complex and poorly formatted JSON file in ADF

Sydney 21 Reputation points
2022-07-02T14:15:44.037+00:00

I have a JSON file which holds the below JSON value. The items in an array have different object formats/structures. How can it be parsed?
When I tried, items such as teachingCertificate from 2nd member JSON string are missing in ADF and I was not able to parse or flatten. I couldn't see some columns (eg.teachingCertificateLevel) of teachers in the parse in ADF if I tried to add teachingCertificateLevel, etc. to Input Columns list.

Results should be saved to an SQL tables.
Student table (destination):
id type active name gender DOB school
1234 Student true Jonny Dup female 2010-10-29 London high school

Teacher table (destination):
id type title name TeachingCertificateWhenObtained TeachingCertificateLevel
1235 Teacher Dr. Jonny John Clinton 1999-01-01 2

JSON file (source)
{
"members":
[
{
"resourceType": "Student",
"id": "1234",
"active": true,
"name":
{
"family": "Dup",
"given": [ "Jonny" ]
}
,
"gender": "female",
"birthDate": "2010-10-29",
"school": { "name": "London High School" }
},
{
"resourceType": "Teacher",
"id": "1235",
"name":
{
"Title": "Dr."
"family": "Clinton",
"given": ["Jonny", "John"]
}
,
"TeachingCertificate":
{
"yearObtained": "1999-01-01",
"level": "2"
}
}
]
}

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,196 questions
0 comments No comments
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 46,442 Reputation points Microsoft Employee
    2022-07-05T23:17:25.243+00:00

    Hello @Sydney ,

    Thanks for the question and using MS Q&A platform.

    As per my understanding you would like to split the above complex json based on resourceType column and load them to respective table in your sink. Please correct if my understanding is incorrect.

    You can use mapping data flow transformations to achieve the above requirement.

    1. First point to your source in source transformation and then under source options go to Json settings and select Array of documents
      List item
    2. Then have a flatten transformation to flatten the JSON source. Unroll by members as shown below and in the input column select as shown below so that all the columns are taken.
      217868-image.png
    3. Then have a conditional split transformation to split the unrolled json source base on the resourceType column values as shown below:
      217914-image.png
    4. Then have a derived column to transform the name, CertificateLevel, CertificateObtainedYear, Title columns as per your requirment as shown below.

    217922-image.png

    For name you can use this expression: reduce(name.given, '', #acc + #item + ' ', #result + name.family)
    For Certification level you can use this: byPath('TeachingCertificate.level')
    For CertificationYearObtained use this: byPath('TeachingCertificate.yearObtained')
    For title use this : byPath('name.Title')

    1. Next have a sink transformation and select only the column you need for your sink table and map them as shown below.

    217826-image.png

    1. For Student stream have a derived column to transform the name column as per your requirment.

    217931-image.png

    1. Then have a sink transformation and select only the columns required for your student table and map them accordingly as shown below.

    217838-image.png

    This will copy the data to respective tables in your sink.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    2 people found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Sydney 21 Reputation points
    2022-07-09T14:16:14.903+00:00

    Thanks for the answer. If the student name is in an array format and teacher's name is not in array, how can it be flattened? As shown below, the JSON has got different format in the same member array. The name column in My ADF is blank for both a student and a teacher.
    ...
    ...
    [{
    "resourceType": "Student",
    "id": "1234",
    "active": true,
    "name":
    {
    "family": "Dup",
    "given": [ "Jonny" ]
    }
    }
    ,
    {
    "resourceType": "Teacher",
    "id": "1235",
    "name":
    {
    "Title": "Dr."
    "full_name": "Clinton Smith",
    }
    }
    ]
    .....