question

HarshGandhi-4838 avatar image
0 Votes"
HarshGandhi-4838 asked HarshGandhi-4838 commented

Process schema changing xml via dataflow

Aim is to process xml file with xsd, flatten it and store as parquet.
xsd details do not exist in xml file.
Challenges
1. xml tags have property like minOccurs = 0, maxOccurs = unbounded. This mean this tag is optional. It can appear multiple times if it exists. This tag may appear in one day xml instance and not appear in another day instance or may appear in one record and not appear in subsequent record.
2. Every few days this xsd schema changes so idea is to make dynamic mapping dataflow to process this xml file.
3. In reference to 1 and when a parent tag is minOccurs =1 (meaning mandatory) with child attribute (minOccurs = 0,maxOccurs = unbounded) of an element, for a single day xml instance file, first record first occurance doesn't have this optional tag and first record second occurance has this optional tag. Due to this, the dataflow adds value of second occurances as first member array value once the xml is flatten.

How can dataflow process this type of xml.




azure-data-factory
· 6
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@MarkKromer-MSFT , could you please help with the solutions

0 Votes 0 ·

Would it be possible to include a snippet of your XML sample?

0 Votes 0 ·

@MarkKromer-MSFT , please find the sample xml file with expected output vs actual output.


[1]: /answers/storage/attachments/191911-anonymous-data.pdf

0 Votes 0 ·
anonymous-data.pdf (280.7 KiB)

@MarkKromer-MSFT , do you have an update on below query?

0 Votes 0 ·

Hi @HarshGandhi-4838 ,
Just checking back to see if you got the resolution for the query. If not, please revert back so that we will respond back with the more details and we will try to help

0 Votes 0 ·

@AnnuKumari-MSFT, not yet. I am waiting for the response. Based on the request, I have uploaded sample data and actual vs expected output. Please help to get it resolved at the earliest.

0 Votes 0 ·

1 Answer

HarshGandhi-4838 avatar image
0 Votes"
HarshGandhi-4838 answered

@MarkKromer-MSFT , do you have an update?

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.