How to parse XML files with optional fields in ADF?

Question

How to parse XML files with optional fields in ADF?

Ridham 21

I have a XML file where we need flattening. The problem is some fields in the XML are optional(as per XSD). The field might be present or might not be. So fixed mapping based on XML is not working. We need mapping based on XSD so every scenario is covered.

For example:-

In this XML file, we have <Audit Record> inside ItemData, but there maybe a possibility that <Audit Record> might be present directly under <ItemGroupData> and <ItemData> is not there at all. Or some other field is present.

Is there any way we can do dynamic mapping based on XSDs?

Answer accepted by question author

0 additional answers

Your answer

Answer 1

Hello @Ridham and welcome to Microsoft Q&A.

As I understand there are two asks here:

How to flatten and get the optional fields which might not always appear in the result of "import schema"
Use the XSD to inform the schema mapping

I haven't worked with XSD before. From what I am learning, it basically IS the schema definition. In both Copy activity and Data flow, there are options to validate the XML data against your choice of XSD or DTD. Given this validation feature, it is made clear we do somehow read the XSD. It would make sense to have a feature where we import schema from the validation ... but since you are asking this question, I'm guessing it is not being used. Unless "Allow Schema Drift" causes "import schema" to ignore the XSD in favor of looking at the data instead.

Anyway, to the useful information. Use of rule-based mapping in flatten transformation allows to catch optional fields and fields not found in the schema.

Here is my working example:

Input Data:

<?xml version='1.0'?>  
<root>  
    <items>  
        <id>1</id>  
        <name>first</name>  
    </items>  
    <items>  
        <id>2</id>  
        <name>second</name>  
    </items>  
    <items>  
        <id>3</id>  
        <name>optional</name>  
        <additional>true</additional>  
    </items>  
    <items>  
        <id>4</id>  
        <name>fourth</name>  
    </items>  
</root>

Data Flow script. Note that the second line where the source schema is defined, field "additional" is missing.

source(output(  
		root as (items as (id as short, name as string)[])  
	),  
	allowSchemaDrift: true,  
	validateSchema: false,  
	ignoreNoFilesFound: false,  
	format: 'xml',  
	fileSystem: 'data',  
	folderPath: 'input',  
	fileName: 'additional.xml',  
	validationMode: 'none',  
	namespaces: false) ~> source1  
source1 foldDown(unroll(root.items),  
	mapColumn(  
		every(match(true()))  
	),  
	skipDuplicateMapInputs: false,  
	skipDuplicateMapOutputs: false) ~> flatten1  
flatten1 sink(allowSchemaDrift: true,  
	validateSchema: false,  
	format: 'delimited',  
	fileSystem: 'data',  
	folderPath: 'out',  
	columnDelimiter: ',',  
	escapeChar: '\\',  
	quoteChar: '\"',  
	columnNamesAsHeader: true,  
	umask: 0022,  
	preCommands: [],  
	postCommands: [],  
	skipDuplicateMapInputs: true,  
	skipDuplicateMapOutputs: true) ~> sink1

Share via

How to parse XML files with optional fields in ADF?

0 additional answers

Your answer