XML format in Data Factory in Microsoft Fabric
This article outlines how to configure XML format in the data pipeline of Data Factory in Microsoft Fabric.
XML format is supported for the following activities and connectors as source.
Category | Connector/Activity |
---|---|
Supported connector | Amazon S3 |
Amazon S3 Compatible | |
Azure Blob Storage | |
Azure Data Lake Storage Gen1 | |
Azure Data Lake Storage Gen2 | |
Azure Files | |
File system | |
FTP | |
Google Cloud Storage | |
HTTP | |
Lakehouse Files | |
Oracle Cloud Storage | |
SFTP | |
Supported activity | Copy activity (source/-) |
Lookup activity | |
GetMetadata activity | |
Delete activity |
To configure XML format, choose your connection in the source of data pipeline copy activity, and then select XML in the drop-down list of File format. Select Settings for further configuration of this format.
After you select Settings in the File format section, the following properties are shown in the pop-up File format settings dialog box.
Compression type: The compression codec used to read XML files. You can choose from None, bzip2, gzip, deflate, ZipDeflate, TarGZip or tar type in the drop-down list.
If you select ZipDeflate as the compression type, Preserve zip file name as folder is displayed under the Advanced settings in the Source tab.
- Preserve zip file name as folder: Indicates whether to preserve the source zip file name as a folder structure during copy.
- If this box is checked (default), the service writes unzipped files to
<specified file path>/<folder named as source zip file>/
. - If this box is unchecked, the service writes unzipped files directly to
<specified file path>
. Make sure you don't have duplicated file names in different source zip files to avoid racing or unexpected behavior.
- If this box is checked (default), the service writes unzipped files to
If you select TarGZip/tar as the compression type, Preserve compression file name as folder is displayed under the Advanced settings in the Source tab.
- Preserve compression file name as folder: Indicates whether to preserve the source compressed file name as a folder structure during copy.
- If this box is checked (default), the service writes decompressed files to
<specified file path>/<folder named as source compressed file>/
. - If this box is unchecked, the service writes decompressed files directly to
<specified file path>
. Make sure you don't have duplicated file names in different source files to avoid racing or unexpected behavior.
- If this box is checked (default), the service writes decompressed files to
- Preserve zip file name as folder: Indicates whether to preserve the source zip file name as a folder structure during copy.
Compression level: Specify the compression ratio when you select a compression type. You can choose from Fastest or Optimal.
- Fastest: The compression operation should complete as quickly as possible, even if the resulting file is not optimally compressed.
- Optimal: The compression operation should be optimally compressed, even if the operation takes a longer time to complete. For more information, see Compression Level topic.
Encoding: Specify the encoding type used to write test files. Select one type from the drop-down list. The default value is UTF-8.
Null value: Specifies the string representation of null value. The default value is empty string.
Under Advanced settings in the Source tab, the following XML format related properties are displayed.
Validation mode: Specifies whether to validate the XML schema. Select one mode from the drop-down list.
- None: Select this to not use validation mode.
- xsd: Select this to validate the XML schema using XSD.
- dtd: Select this to validate the XML schema using DTD.
Namespaces: Specify whether to enable namespace when parsing the XML files. It is selected by default.
Namespace prefix pairs: If the Namespaces is enabled, selecting + New and specify the URL and Prefix. You can add more pairs by selecting + New.
Namespace URI to prefix mapping is used to name fields when parsing the XML file. If an XML file has namespace and namespace is enabled, by default, the field name is the same as it is in the XML document. If there is an item defined for the namespace URI in this map, the field name isprefix:fieldName
.Detect data type: Specify whether to detect integer, double, and Boolean data types. It is selected by default.
The following properties are supported in the copy activity Source section when using XML format.
Name | Description | Value | Required | JSON script property |
---|---|---|---|---|
File format | The file format that you want to use. | XML | Yes | type (under datasetSettings ):Xml |
Compression type | The compression codec used to read XML files. | None bzip2 gzip deflate ZipDeflate TarGZip tar |
No | type (under compression ): bzip2 gzip deflate ZipDeflate TarGZip tar |
Compression level | The compression ratio. | Fastest Optimal |
No | level (under compression ): Fastest Optimal |
Encoding | The encoding type used to read test files. | "UTF-8" (by default),"UTF-8 without BOM", "UTF-16LE", "UTF-16BE", "UTF-32LE", "UTF-32BE", "US-ASCII", "UTF-7", "BIG5", "EUC-JP", "EUC-KR", "GB2312", "GB18030", "JOHAB", "SHIFT-JIS", "CP875", "CP866", "IBM00858", "IBM037", "IBM273", "IBM437", "IBM500", "IBM737", "IBM775", "IBM850", "IBM852", "IBM855", "IBM857", "IBM860", "IBM861", "IBM863", "IBM864", "IBM865", "IBM869", "IBM870", "IBM01140", "IBM01141", "IBM01142", "IBM01143", "IBM01144", "IBM01145", "IBM01146", "IBM01147", "IBM01148", "IBM01149", "ISO-2022-JP", "ISO-2022-KR", "ISO-8859-1", "ISO-8859-2", "ISO-8859-3", "ISO-8859-4", "ISO-8859-5", "ISO-8859-6", "ISO-8859-7", "ISO-8859-8", "ISO-8859-9", "ISO-8859-13", "ISO-8859-15", "WINDOWS-874", "WINDOWS-1250", "WINDOWS-1251", "WINDOWS-1252", "WINDOWS-1253", "WINDOWS-1254", "WINDOWS-1255", "WINDOWS-1256", "WINDOWS-1257", "WINDOWS-1258" | No | encodingName |
Preserve zip file name as folder | Indicates whether to preserve the source zip file name as a folder structure during copy. | Selected (default) or unselect | No | preserveZipFileNameAsFolder (under compressionProperties ->type as ZipDeflateReadSettings ):true (default) or false |
Preserve compression file name as folder | Indicates whether to preserve the source compressed file name as a folder structure during copy. | Selected (default) or unselect | No | preserveCompressionFileNameAsFolder (under compressionProperties ->type as TarGZipReadSettings or TarReadSettings ):true (default) or false |
Null value | The string representation of null value. | <your null value> empty string (by default) |
No | nullValue |
Validation mode | Whether to validate the XML schema. | None xsd dtd |
No | validationMode: xsd dtd |
Namespaces | Whether to enable namespace when parsing the XML files. | Selected (default) or unselected | No | namespaces: true (default) or false |
Namespace prefix pairs | Namespace URI to prefix mapping, which is used to name fields when parsing the XML file. If an XML file has namespace and namespace is enabled, by default, the field name is the same as it is in the XML document. If there is an item defined for the namespace URI in this map, the field name is prefix:fieldName . |
< url >:< prefix > | No | namespacePrefixes: < url >:< prefix > |
Detect data type | Whether to detect integer, double, and Boolean data types. | Selected (default) or unselected | No | detectDataType: true (default) or false |