Process schema changing xml via dataflow

Harsh Gandhi 1 Reputation point
2022-04-08T07:45:40.787+00:00

Aim is to process xml file with xsd, flatten it and store as parquet.
xsd details do not exist in xml file.
Challenges

  1. xml tags have property like minOccurs = 0, maxOccurs = unbounded. This mean this tag is optional. It can appear multiple times if it exists. This tag may appear in one day xml instance and not appear in another day instance or may appear in one record and not appear in subsequent record.
  2. Every few days this xsd schema changes so idea is to make dynamic mapping dataflow to process this xml file.
  3. In reference to 1 and when a parent tag is minOccurs =1 (meaning mandatory) with child attribute (minOccurs = 0,maxOccurs = unbounded) of an element, for a single day xml instance file, first record first occurance doesn't have this optional tag and first record second occurance has this optional tag. Due to this, the dataflow adds value of second occurances as first member array value once the xml is flatten.

How can dataflow process this type of xml.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,525 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Harsh Gandhi 1 Reputation point
    2022-04-19T08:31:38.737+00:00

    @MarkKromer-MSFT , do you have an update?

    0 comments No comments