Aim is to process xml file with xsd, flatten it and store as parquet.
xsd details do not exist in xml file.
1. xml tags have property like minOccurs = 0, maxOccurs = unbounded. This mean this tag is optional. It can appear multiple times if it exists. This tag may appear in one day xml instance and not appear in another day instance or may appear in one record and not appear in subsequent record.
2. Every few days this xsd schema changes so idea is to make dynamic mapping dataflow to process this xml file.
3. In reference to 1 and when a parent tag is minOccurs =1 (meaning mandatory) with child attribute (minOccurs = 0,maxOccurs = unbounded) of an element, for a single day xml instance file, first record first occurance doesn't have this optional tag and first record second occurance has this optional tag. Due to this, the dataflow adds value of second occurances as first member array value once the xml is flatten.
How can dataflow process this type of xml.