Data Factory: Multiple collection reference for XML copy to SQL

tnw004 6 Reputation points
2020-08-25T20:10:03.647+00:00

I am attempting to set up a data factory pipeline which will ingest XML data to an Azure SQL Database. The XML follows this structure:

<schools>
    <school>
        <students>
            <student></student>
            <student></student>
            <student></student>
        </students>
    </school>
    <school>
        <students>
            <student></student>
            <student></student>
            <student></student>
        </students>
    </school>
<schools>

I've set up multiple tables in SQL to accept this data. Put simply, there is a schools table which will take all the schools and a students table to accept all the students.

I set up the copy task in data factory and had to set the "Collection reference" to <school> in order to get it to iterate over the schools. If I don't do this it only loads in the first school and ignores the rest.

This works just fine for loading in schools. The problem is with the next copy task which looks at the same XML and tries to copy all the students from all the schools into the students table.

If I set the collection reference to <school> it will only copy in the first student from every school and ignores the rest of the students. If I set the collection reference to <student> it will copy all the students in the first school but ignore the rest of the schools and students.

I'd like to iterate over ALL schools AND students in order to load in all students from all schools but I don't see any easy way to do this. Is there some way to set multiple collection references to both schools and students?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
6,272 questions
{count} votes

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 24,006 Reputation points Microsoft Employee
    2020-08-27T20:30:36.117+00:00

    @tnw004 I am not sure if this will help, but you can split the shredding, so some is done by data factory, and the rest in your SQL.

    I took your sample, and added values to look like:

       <schools>  
                <school name="school1">  
                    <students>  
                        <student>abc</student>  
                        <student>def</student>  
                        <student>ghi</student>  
                    </students>  
                </school>  
                <school name="school2">  
                    <students>  
                        <student>123</student>  
                        <student>456</student>  
                        <student>789</student>  
                    </students>  
                </school>  
           </schools>  
    

    and mapped as
    20985-image.png

    • I disabled detect data type in source

    and the result is

    "[\"abc\",\"def\",\"ghi\"]","school1",  
    "[\"123\",\"456\",\"789\"]","school2",  
    

    Then, all you need do is replace \" with " and shred the array.