Hi,
I need some assistance with ADF Data Flow using an xml data source. We have a data lake full of xml which I need to process. We do have a schema for the xml, but the xml isn't validated against it. We have some optional attributes and nodes in the xml. I have created a Data Flow and imported the schema from the Data Source Projection Tab. In my mappings I map all the fields manually through to the Sink.
Everything works fine if all the attributes or nodes are present, however if the attributes are not in one of the xml files, then everything falls over. I have seen a few videos of using schema-less Data Flow and use late binding to map everything using rule based mapping or pattern based mapping, but nothing is working with xml and I need a little help with what to do and syntax so that I can pull out all the nodes and attributes from the xml. Some of the nodes are repeated so they become arrays in the Data Flow which need to be flattened also.
So if I have the 2 xml files below how can I do it schema-less using rule based mappings and pattern matching? Note that at the entrant level teamname and result are optional and in the second example they are not present.
<data xmlns="http://www.mycompany.co.uk/feed/master/cg" xmlns:cg="http://www.mycompany.co.uk/feed/master/cg" category="AA" country="XX" date="2020-07-14" id="1234" name="test" type="master" version="1.0.0">
<game code="AB001" date="2020-07-14" name="Test Gane" sportcode="AB" version="1">
<event date="2020-07-14" id="AB00020" inplay="yes" numberofperiods="4" progresscode="F" result="86-89" starttime="18:19" status="O" title="Player1 v Player2">
<periods>
<period actualendtime="2020-07-14T17:26:02Z" actualstarttime="2020-07-14T17:20:14Z" number="1" result="14-19" status="Finished" type="quarter"/>
<period actualendtime="2020-07-14T17:33:39Z" actualstarttime="2020-07-14T17:27:02Z" number="2" result="18-24" status="Finished" type="quarter"/>
<period actualendtime="2020-07-14T17:40:25Z" actualstarttime="2020-07-14T17:34:33Z" number="3" result="26-22" status="Finished" type="quarter"/>
<period actualendtime="2020-07-14T17:47:38Z" actualstarttime="2020-07-14T17:40:27Z" number="4" result="28-24" status="Finished" type="quarter"/>
</periods>
<entrants>
<entrant avatar="TeamA" id="1" isteam="no" number="1" score="86" teamname="Team1" result="Loser">
<player firstname="Fred" form="L-L-W-L-L" gamerhandle="Gamer1" id="9" rank="9" surname="Bloggs"/>
</entrant>
<entrant avatar="TeamB" id="2" isteam="no" number="2" score="89" teamname="Team2" result="winner">
<player firstname="John" form="L-L-L-L-L" gamerhandle="Gamer2" id="14" rank="14" surname="Doe"/>
</entrant>
</entrants>
</event>
</game>
</data>
<data xmlns="http://www.mycompany.co.uk/feed/master/cg" xmlns:cg="http://www.mycompany.co.uk/feed/master/cg" category="AA" country="XX" date="2020-07-15" id="1235" name="test" type="master" version="1.0.0">
<game code="AB002" date="2020-07-14" name="Test Gane" sportcode="AB" version="1">
<event date="2020-07-15" id="AB00021" inplay="yes" numberofperiods="4" progresscode="O" result="86-89" starttime="18:19" status="O" title="Player1 v Player2">
<periods>
<period actualendtime="2020-07-15T17:26:02Z" actualstarttime="2020-07-15T17:20:14Z" number="1" result="14-19" status="Finished" type="quarter"/>
<period actualendtime="2020-07-15T17:33:39Z" actualstarttime="2020-07-15T17:27:02Z" number="2" result="18-24" status="Finished" type="quarter"/>
<period actualendtime="2020-07-15T17:40:25Z" actualstarttime="2020-07-15T17:34:33Z" number="3" result="26-22" status="Finished" type="quarter"/>
<period actualendtime="2020-07-15T17:47:38Z" actualstarttime="2020-07-15T17:40:27Z" number="4" result="28-24" status="Finished" type="quarter"/>
</periods>
<entrants>
<entrant avatar="TeamA" id="1" isteam="no" number="1" score="90">
<player firstname="Fred" gamerhandle="Gamer1" id="9" rank="9" surname="Bloggs"/>
</entrant>
<entrant avatar="TeamB" id="2" isteam="no" number="2" score="88">
<player firstname="John" gamerhandle="Gamer2" id="14" rank="14" surname="Doe"/>
</entrant>
</entrants>,
</event>
</game>
</data>
If I add the source xml without schema, and then preview the data I see things like data{} and below that @category, @country, @apl , etc but if I try and use a Select or Derived Column Mapping to map the attribute it doesn't pick it up and map it, and the same goes for @teamname and @result. I have tried things like toString(byName("@teamname")) in a column mapping with and without the '@' sign but I can't pick anything out of the xml.