XML Format Files (SQL Server)
Applies to: SQL Server Azure SQL Database Azure SQL Managed Instance Azure Synapse Analytics Analytics Platform System (PDW)
SQL Server provides an XML schema that defines syntax for writing XML format files to use for bulk importing data into a SQL Server table. XML format files must adhere to this schema, which is defined in the XML Schema Definition Language (XSDL). XML format files are only supported when SQL Server tools are installed together with SQL Server Native Client.
You can use an XML format file with a bcp command, BULK INSERT statement, or INSERT ... SELECT * FROM OPENROWSET(BULK...) statement. The bcp command allows you to automatically generate an XML format file for a table; for more information, see bcp Utility.
Note
Two types of format files are supported for bulk exporting and importing: non-XML format files and XML format files. XML format files provide a flexible and powerful alternative to non-XML format files. For information about non-XML format files, see Non-XML Format Files (SQL Server).
Note
This syntax, including bulk insert, is not supported in Azure Synapse Analytics. In Azure Synapse Analytics and other cloud database platform integrations, accomplish data movement via the COPY statement in Azure Data Factory, or by using T-SQL statements such as COPY INTO and PolyBase.
XML format files are self-describing, making them easy to read, create, and extend. They are human readable, making it easy to understand how data is interpreted during bulk operations.
XML format files contain the data types of target columns. The XML encoding clearly describes the data types and data elements of the data file and also the mapping between data elements and table columns.
This enables separation between how data is represented in the data file and what data type is associated with each field in the file. For example, if a data file contains a character representation of the data, the corresponding SQL column type is lost.
An XML format file allows for loading of a field that contains a single large object (LOB) data type from a data file.
An XML format file can be enhanced yet remain compatible with its earlier versions. Furthermore, the clarity of XML encoding facilitates the creation of multiple format files for a given data file. This is useful if you have to map all or some of the data fields to columns in different tables or views.
The XML syntax is independent of the direction of the operation; that is, the syntax is the same for bulk export and bulk import.
You can use XML format files to bulk import data into tables or non-partitioned views and to bulk export data.
For the OPENROWSET(BULK...) function specifying a target table is optional. This is because the function relies on the XML format file to read data from a data file.
Note
A target table is necessary with the bcp command and the BULK INSERT statement, which uses the target table columns to do the type conversion.
Like a non-XML format file, an XML format file defines the format and structure of the data fields in a data file and maps those data fields to columns in a single target table.
An XML format file possesses two main components, <RECORD> and <ROW>:
<RECORD> describes the data as it is stored in the data file.
Each <RECORD> element contains a set of one or more <FIELD> elements. These elements correspond to fields in the data file. The basic syntax is as follows:
<RECORD>
<FIELD .../> [ ...n ]
</RECORD>
Each <FIELD> element describes the contents of a specific data field. A field can only be mapped to one column in the table. Not all fields need to be mapped to columns.
A field in a data file can be either of fixed/variable length or character terminated. A field value can be represented as: a character (using single-byte representation), a wide character (using Unicode two-byte representation), native database format, or a file name. If a field value is represented as a file name, the file name points to the file that contains the value of a BLOB column in the target table.
<ROW> describes how to construct data rows from a data file when the data from the file is imported into a SQL Server table.
A <ROW> element contains a set of <COLUMN> elements. These elements correspond to table columns. The basic syntax is as follows:
<ROW>
<COLUMN .../> [ ...n ]
</ROW>
Each <COLUMN> element can be mapped to only one field in the data file. The order of the <COLUMN> elements in the <ROW> element defines the order in which they are returned by the bulk operation. The XML format file assigns each <COLUMN> element a local name that has no relationship to the column in the target table of a bulk import operation.
This section contains a summary of the elements and attributes of the XML schema for XML format files. The syntax of a format file is independent of the direction of the operation; that is, the syntax is the same for bulk export and bulk import. This section also considers how bulk import uses the <ROW> and <COLUMN> elements and how to put the xsi:type value of an element into a data set.
To see how the syntax corresponds to actual XML format files, see sample XML format files.
Note
You can modify a format file to let you bulk import from a data file in which the number and/or order of the fields differ from the number and/or order of table columns. For more information, see Format Files for Importing or Exporting Data (SQL Server).
This syntax statements show only the elements (<BCPFORMAT>, <RECORD>, <FIELD>, <ROW>, and <COLUMN>) and their basic attributes.
Note
Additional attributes that are associated with the value of the xsi:type in a <FIELD> or <COLUMN> element are described later in this topic.
This section summarizes the purpose of each element that the XML schema defines for XML format files. The attributes are described in separate sections later in this topic.
<BCPFORMAT>
Is the format-file element that defines the record structure of a given data file and its correspondence to the columns of a table row in the table.
<RECORD .../>
Defines a complex element containing one or more <FIELD> elements. The order in which the fields are declared in the format file is the order in which those fields appear in the data file.
<FIELD .../>
Defines a field in data file, which contains data.
The attributes of this element are discussed in Attributes of the <FIELD> Element, later in this topic.
<ROW .../>
Defines a complex element containing one or more <COLUMN> elements. The order of the <COLUMN> elements is independent of the order of <FIELD> elements in a RECORD definition. Rather, the order of the <COLUMN> elements in a format file determines the column order of the resultant rowset. Data fields are loaded in the order in which the corresponding <COLUMN> elements are declared in the <COLUMN> element.
For more information, see How Bulk Import Uses the <ROW> Element, later in this topic.
<COLUMN>
Defines a column as an element (<COLUMN>). Each <COLUMN> element corresponds to a <FIELD> element (whose ID is specified in the SOURCE attribute of the <COLUMN> element).
The attributes of this element are discussed in Attributes of the <COLUMN> Element, later in this topic. Also see, How Bulk Import Uses the <COLUMN> Element, later in this topic.
</BCPFORMAT>
Required to end the format file.
This section describes the attributes of the <FIELD> element, which are summarized in the following schema syntax:
<FIELD
ID ="fieldID"
xsi:type ="fieldType"
[ LENGTH ="n" ]
[ PREFIX_LENGTH ="p" ]
[ MAX_LENGTH ="m" ]
[ COLLATION ="collationName" ]
[ TERMINATOR ="terminator" ]
/>
Each <FIELD> element is independent of the others. A field is described in terms of the following attributes:
FIELD Attribute | Description | Optional / Required |
---|---|---|
ID ="fieldID" | Specifies the logical name of the field in the data file. The ID of a field is the key used to refer to the field. <FIELD ID**="fieldID"/> maps to <COLUMN SOURCE="fieldID"**/> |
Required |
xsi:type ="fieldType" | This is an XML construct (used like an attribute) that identifies the type of the instance of the element. The value of fieldType determines which of the optional attributes (below) you need in a given instance. | Required (depending on the data type) |
LENGTH ="n" | This attribute defines the length for an instance of a fixed-length data type. The value of n must be a positive integer. |
Optional unless required by the xsi:type value |
PREFIX_LENGTH ="p" | This attribute defines the prefix length for a binary data representation. The PREFIX_LENGTH value, p, must be one of the following: 1, 2, 4, or 8. | Optional unless required by the xsi:type value |
MAX_LENGTH ="m" | This attribute is the maximum number of bytes that can be stored in a given field. Without a target table, the column max-length is not known. The MAX_LENGTH attribute restricts the maximum length of an output character column, limiting the storage allocated for the column value. This is especially convenient when using the OPENROWSET function's BULK option in a SELECT FROM clause. The value of m must be a positive integer. By default, the maximum length is 8000 characters for a char column and 4000 characters for an nchar column. |
Optional |
COLLATION ="collationName" | COLLATION is only allowed for character fields. For a list of the SQL collation names, see SQL Server Collation Name (Transact-SQL). | Optional |
TERMINATOR = "terminator" | This attribute specifies the terminator of a data field. The terminator can be any character. The terminator must be a unique character that is not part of the data. By default, the field terminator is the tab character (represented as \t). To represent a paragraph mark, use \r\n. |
Used only with an xsi:type of character data, which requires this attribute |
The xsi:type value is an XML construct (used like an attribute) that identifies the data type of an instance of an element. For information on using the "Putting the xsi:type Value into a Data Set," later in this section.
The xsi:type value of the <FIELD> element supports the following data types.
<FIELD> xsi:type values | Required XML Attribute(s) for Data Type |
Optional XML Attribute(s) for Data Type |
---|---|---|
NativeFixed | LENGTH | None. |
NativePrefix | PREFIX_LENGTH | MAX_LENGTH |
CharFixed | LENGTH | COLLATION |
NCharFixed | LENGTH | COLLATION |
CharPrefix | PREFIX_LENGTH | MAX_LENGTH, COLLATION |
NCharPrefix | PREFIX_LENGTH | MAX_LENGTH, COLLATION |
CharTerm | TERMINATOR | MAX_LENGTH, COLLATION |
NCharTerm | TERMINATOR | MAX_LENGTH, COLLATION |
For more information about Microsoft SQL Server data types, see Data Types (Transact-SQL).
This section describes the attributes of the <COLUMN> element, which are summarized in the following schema syntax:
<COLUMN
SOURCE = "fieldID"
NAME = "columnName"
xsi:type = "columnType"
[ LENGTH = "n" ]
[ PRECISION = "n" ]
[ SCALE = "value" ]
[ NULLABLE = { "YES"
"NO" } ]
/>
A field is mapped to a column in the target table using the following attributes:
COLUMN Attribute | Description | Optional / Required |
---|---|---|
SOURCE ="fieldID" | Specifies the ID of the field being mapped to the column. <COLUMN SOURCE**="fieldID"/> maps to <FIELD ID="fieldID"**/> |
Required |
NAME = "columnName" | Specifies the name of the column in the row set represented by the format file. This column name is used to identify the column in the result set, and it need not correspond to the column name used in the target table. | Required |
xsi**:type ="ColumnType"** | This is an XML construct (used like an attribute) that identifies the data type of the instance of the element. The value of ColumnType determines which of the optional attributes (below) you need in a given instance. Note: The possible values of ColumnType and their associated attributes are listed in the <COLUMN> element table in the Xsi:type values of the <COLUMN> Element section. |
Optional |
LENGTH ="n" | Defines the length for an instance of a fixed-length data type. LENGTH is used only when the xsi:type is a string data type. The value of n must be a positive integer. |
Optional (available only if the xsi:type is a string data type) |
PRECISION ="n" | Indicates the number of digits in a number. For example, the number 123.45 has a precision of 5. The value must be a positive integer. |
Optional (available only if the xsi:type is a variable-number data type) |
SCALE ="int" | Indicates the number of digits to the right of the decimal point in a number. For example, the number 123.45 has a scale of 2. The value must be an integer. |
Optional (available only if the xsi:type is a variable-number data type) |
NULLABLE = { "YES" "NO" } |
Indicates whether a column can assume NULL values. This attribute is completely independent of FIELDS. However, if a column is not NULLABLE and field specifies NULL (by not specifying any value), a run-time error results. The NULLABLE attribute is used only if you do a plain SELECT FROM OPENROWSET(BULK...) statement. |
Optional (available for any data type) |
The xsi:type value is an XML construct (used like an attribute) that identifies the data type of an instance of an element. For information on using the "Putting the xsi:type Value into a Data Set," later in this section.
The <COLUMN> element supports native SQL data types, as follows:
Type Category | <COLUMN> Data Types | Required XML Attribute(s) for Data Type |
Optional XML Attribute(s) for Data Type |
---|---|---|---|
Fixed | SQLBIT, SQLTINYINT, SQLSMALLINT, SQLINT, SQLBIGINT, SQLFLT4, SQLFLT8, SQLDATETIME, SQLDATETIM4, SQLDATETIM8, SQLMONEY, SQLMONEY4, SQLVARIANT, and SQLUNIQUEID | None. | NULLABLE |
Variable Number | SQLDECIMAL and SQLNUMERIC | None. | NULLABLE, PRECISION, SCALE |
LOB | SQLIMAGE, CharLOB, SQLTEXT, and SQLUDT | None. | NULLABLE |
Character LOB | SQLNTEXT | None. | NULLABLE |
Binary string | SQLBINARY and SQLVARYBIN | None. | NULLABLE, LENGTH |
Character string | SQLCHAR, SQLVARYCHAR, SQLNCHAR, and SQLNVARCHAR | None. | NULLABLE, LENGTH |
Important
To bulk export or import SQLXML data, use one of the following data types in your format file: SQLCHAR or SQLVARYCHAR (the data is sent in the client code page or in the code page implied by the collation), SQLNCHAR or SQLNVARCHAR (the data is sent as Unicode), or SQLBINARY or SQLVARYBIN (the data is sent without any conversion).
For more information about SQL Server data types, see Data Types (Transact-SQL).
The <ROW> element is ignored in some contexts. Whether the <ROW> element affects a bulk-import operation depends on how the operation is performed:
The bcp command
When data is loaded into a target table, bcp ignores the <ROW> component. Instead, bcp loads the data based on the column types of the target table.
Transact-SQL statements (BULK INSERT and OPENROWSET's Bulk rowset provider)
When bulk importing data into a table, Transact-SQL statements use the <ROW> component to generate the input rowset. Also, Transact-SQL statements perform appropriate type conversions based on the column types specified under <ROW> and the corresponding column in the target table. If a mismatch exists between column types as specified in the format file and in the target table, an extra type conversion occurs. This extra type conversion may lead to some discrepancy (that is, a loss of precision) in behavior in BULK INSERT or OPENROWSET's Bulk rowset provider as compared to bcp.
The information in the <ROW> element allows a row to be constructed without requiring any additional information. For this reason, you can generate a rowset using a SELECT statement (SELECT * FROM OPENROWSET(BULK datafile FORMATFILE=xmlformatfile).
Note
The OPENROWSET BULK clause requires a format file (note that converting from the data type of the field to the data type of a column is available only with an XML format file).
For bulk importing data into a table, the <COLUMN> elements in a format file map a data-file field to table columns by specifying:
The position of each field within a row in the data file.
The column type, which is used to convert the field data type to the desired column data type.
If no column is mapped to a field, the field is not copied into the generated row(s). This behavior allows a data file to generate rows with different columns (in different tables).
Similarly, for bulk exporting data from a table, each <COLUMN> in the format file maps the column from the input table row to its corresponding field in the output data file.
When an XML document is validated through the XML Schema Definition (XSD) language, the xsi:type value is not put into the data set. However, you can put the xsi:type information into the data set by loading the XML format file into an XML document (for example, myDoc
), as illustrated in the following code snippet:
...;
myDoc.LoadXml(xmlFormat);
XmlNodeList ColumnList = myDoc.GetElementsByTagName("COLUMN");
for(int i=0;i<ColumnList.Count;i++)
{
Console.Write("COLUMN: xsi:type=" +ColumnList[i].Attributes["type",
"http://www.w3.org/2001/XMLSchema-instance"].Value+"\n");
}
This section contains information on using XML format files in a variety of cases, including an Adventure Works example.
Note
In the data files shown in the following examples, <tab>
indicates a tab character in a data file, and <return>
indicates a carriage return.
The examples illustrate key aspects of using XML format files, as follows:
Note
For information about how to create format files, see Create a Format File (SQL Server).
The following example shows an XML format file that describes a data file containing three fields of character data. The format file maps the data file to a table that contains three columns. The data fields correspond one-to-one with the columns of the table.
Table (row): Person (Age int, FirstName varchar(20), LastName varchar(30))
Data file (record): Age<tab>Firstname<tab>Lastname<return>
The following XML format file reads from the data file to the table.
In the <RECORD>
element, the format file represents the data values in all three fields as character data. For each field, the TERMINATOR
attribute indicates the terminator that follows the data value.
The data fields correspond one-to-one with the columns of the table. In the <ROW>
element, the format file maps the column Age
to the first field, the column FirstName
to the second field, and the column LastName
to the third field.
<?xml version="1.0"?>
<BCPFORMAT
xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\t"
MAX_LENGTH="12"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\t"
MAX_LENGTH="20" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\r\n"
MAX_LENGTH="30"
COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="age" xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="firstname" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="3" NAME="lastname" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
Note
For an equivalent AdventureWorks2022
example, see Create a Format File (SQL Server).
The following example shows an XML format file that describes a data file containing three fields of character data. The format file maps the data file to a table that contains three columns that are ordered differently from the fields of the data file.
Table (row): Person (Age int, FirstName varchar(20), LastName varchar(30))
Data file (record): Age<tab>Lastname<tab>Firstname<return>
In the <RECORD>
element, the format file represents the data values in all three fields as character data.
In the <ROW>
element, the format file maps the column Age
to the first field, the column FirstName
to the third field, and the column LastName
to the second field.
<?xml version="1.0"?>
<BCPFORMAT
xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\t"
MAX_LENGTH="12"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="20"
COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\r\n"
MAX_LENGTH="30" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="age" xsi:type="SQLINT"/>
<COLUMN SOURCE="3" NAME="firstname" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="2" NAME="lastname" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
Note
For an equivalent AdventureWorks2022
example, see Use a Format File to Map Table Columns to Data-File Fields (SQL Server).
The following example shows an XML format file that describes a data file containing four fields of character data. The format file maps the data file to a table that contains three columns. The second data field does not correspond to any table column.
Table (row): Person (Age int, FirstName Varchar(20), LastName Varchar(30))
Data file (record): Age<tab>employeeID<tab>Firstname<tab>Lastname<return>
In the <RECORD>
element, the format file represents the data values in all four fields as character data. For each field, the TERMINATOR attribute indicates the terminator that follows the data value.
In the <ROW>
element, the format file maps the column Age
to the first field, the column FirstName
to the third field, and the column LastName
to the fourth field.
<?xml version = "1.0"?>
<BCPFORMAT
xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\t"
MAX_LENGTH="12"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\t"
MAX_LENGTH="10"
COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\t"
MAX_LENGTH="20"
COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR="\r\n"
MAX_LENGTH="30"
COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="age" xsi:type="SQLINT"/>
<COLUMN SOURCE="3" NAME="firstname" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="4" NAME="lastname" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
Note
For an equivalent AdventureWorks2022
example, see Use a Format File to Skip a Data Field (SQL Server).
The following example shows different types of fields and their mappings to columns.
<?xml version = "1.0"?>
<BCPFORMAT
xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD xsi:type="CharTerm" ID="C1" TERMINATOR="\t"
MAX_LENGTH="4"/>
<FIELD xsi:type="CharFixed" ID="C2" LENGTH="10"
COLLATION="SQL_LATIN1_GENERAL_CP1_CI_AS"/>
<FIELD xsi:type="CharPrefix" ID="C3" PREFIX_LENGTH="2"
MAX_LENGTH="32" COLLATION="SQL_LATIN1_GENERAL_CP1_CI_AS"/>
<FIELD xsi:type="NCharTerm" ID="C4" TERMINATOR="\t"
MAX_LENGTH="4"/>
<FIELD xsi:type="NCharFixed" ID="C5" LENGTH="10"
COLLATION="SQL_LATIN1_GENERAL_CP1_CI_AS"/>
<FIELD xsi:type="NCharPrefix" ID="C6" PREFIX_LENGTH="2"
MAX_LENGTH="32" COLLATION="SQL_LATIN1_GENERAL_CP1_CI_AS"/>
<FIELD xsi:type="NativeFixed" ID="C7" LENGTH="4"/>
</RECORD>
<ROW>
<COLUMN SOURCE="C1" NAME="Age" xsi:type="SQLTINYINT"/>
<COLUMN SOURCE="C2" NAME="FirstName" xsi:type="SQLVARYCHAR"
LENGTH="16" NULLABLE="NO"/>
<COLUMN SOURCE="C3" NAME="LastName" />
<COLUMN SOURCE="C4" NAME="Salary" xsi:type="SQLMONEY"/>
<COLUMN SOURCE="C5" NAME="Picture" xsi:type="SQLIMAGE"/>
<COLUMN SOURCE="C6" NAME="Bio" xsi:type="SQLTEXT"/>
<COLUMN SOURCE="C7" NAME="Interest"xsi:type="SQLDECIMAL"
PRECISION="5" SCALE="3"/>
</ROW>
</BCPFORMAT>
The following example creates an empty two-column table (t_xml
), in which the first column maps to the int
data type and the second column maps to the xml
data type.
CREATE TABLE t_xml (c1 int, c2 xml)
The following XML format file would load a data file into table t_xml
.
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="NativePrefix" PREFIX_LENGTH="1"/>
<FIELD ID="2" xsi:type="NCharPrefix" PREFIX_LENGTH="8"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="c1" xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="c2" xsi:type="SQLNCHAR"/>
</ROW>
</BCPFORMAT>
The following example describes fixed fields of 10
or 6
characters each. The format file represents these field lengths/widths as LENGTH="10"
and LENGTH="6"
, respectively. Every row of the data files ends with a carriage return-line feed combination, {CR}{LF}, which the format file represents as TERMINATOR="\r\n"
.
<?xml version="1.0"?>
<BCPFORMAT
xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="10"/>
<FIELD ID="2" xsi:type="CharFixed" LENGTH="6"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\r\n"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="C1" xsi:type="SQLINT" />
<COLUMN SOURCE="2" NAME="C2" xsi:type="SQLINT" />
</ROW>
</BCPFORMAT>
For additional examples of both non-XML format files and XML format files, see the following topics: