Generating XML Documents from XML Schemas

 

Priya Lakshminarayanan
Microsoft Corporation

August 2004

Applies to:
   Microsoft .NET Framework
   XML documents and schemas

Summary: Priya Lakshminarayanan shows how you can use the classes in the System.XML.Schema namespace of the Microsoft .NET Framework to build a tool that generates sample XML documents that conform to a given schema. (16 printed pages)

Click here to download the code sample for this article.

Contents

Introduction
Overview of the XmlSampleGenerator API
Nuts and Bolts of the XmlSampleGenerator
Working with the XmlSampleGenerator
Limitations of the XMLSampleGenerator
Conclusion

Introduction

The W3C XML Schema recommendation has been adopted as a standard in much of the XML world to describe the structure and content of an XML document. For authors unfamiliar with writing XML schemas from scratch, there is often the need to be able to generate an XML schema from a given XML document. Many tools in the Microsoft .NET Framework, such as the xsd.exe in the .NET Framework SDK, the ADO.NET DataSet, and the Inference class in the System.XML.Schema namespace of the .NET Framework 2.0 Beta 1, provide the ability, given a sample XML document, to infer a schema. However, not many tools exist, on Microsoft platforms, which generate a sample XML document from a given schema.

This article will describe a tool that can generate an XML document given a schema, and assumes that the reader is familiar with the W3C XML Schema specification. Such a tool can be useful for many purposes: to design the schema progressively by looking at the generated instance, to generate random samples as test cases for applications that consume XML, and other such scenarios. This tool is aimed at the former case: in-schema design scenarios.

Note   The code sample for this article requires the .NET Framework v2.0 beta 1 or higher.

Goals of the XML Generator

  1. To generate an XML sample that is easy to read and illustrates the use of various constructs in the given XML Schema.
  2. The generated document should be valid with respect to the schema. If validity can not be achieved, it should be signaled by adding comments to the generated document.
  3. The generation should be deterministic—that is, the same schema will generate the same instance document.

Overview of the XmlSampleGenerator API

Constructors

There are three constructor overloads that can be used to create the XmlSampleGenerator:

  1. An overload that takes the schema file name and the qualified name of the root element of the XML document.

    public XmlSampleGenerator(string url, XmlQualifiedName rootElem)
    
  2. An overload that takes an XmlSchema object and the qualified name of the root element of the XML document.

    public XmlSampleGenerator(XmlSchema schema, XmlQualifiedName rootElem)
    
  3. An overload that takes an XmlSchemaSet object and the qualified name of the root element of the XML document.

    public XmlSampleGenerator(XmlSchemaSet schemaSet, XmlQualifiedName rootElem)
    

The second and third overloads will be useful in integrating this tool with other utilities where the user might already have an XmlSchema or XmlSchemaSet that holds the application's schemas.

Properties

There are three properties exposed on the XmlSampleGenerator:

  1. XmlResolverpublic XmlResolver XmlResolver { set; }

    Using this property, users can set their custom resolver, and this will be used in loading the import / include /redefine schemaLocations in the XmlSchemaSet. The XmlUrlResolver is used by default.

  2. MaxThreshold—public int MaxThreshold { get; set; }

    This property overrides the value of the maxOccurs attribute in the schema. If maxOccurs="100" or maxOccurs="unbounded" appears in an element, setting this property to 10 will generate that element 10 times. The default value for this property is 5.

  3. ListLengthpublic int ListLength { get; set; }

    This property defines how many items should be generated for a simple type of variety list. The default value of this property is 3.

Methods

  1. WriteXmlpublic void WriteXml(XmlWriter writer)

The WriteXml() method takes an XmlWriter that is used to write the generated XML. Since this takes an XmlWriter, users can write the XML to their store or to a file.

The following is a simple purchaseOrder schema, with a top-level PurchaseOrder element that has two child elements, ShipTo and BillTo, and one attribute, OrderDate.

Po.xsd:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:tns="http://tempuri.org" targetNamespace="http://tempuri.org" elementFormDefault="qualified">
 <xsd:element name="PurchaseOrder" type="tns:PurchaseOrderType"/>
 <xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
   <xsd:element name="ShipTo" type="tns:USAddress" maxOccurs="2"/>
   <xsd:element name="BillTo" type="tns:USAddress"/>
  </xsd:sequence>
  <xsd:attribute name="OrderDate" type="xsd:date"/>
 </xsd:complexType>

 <xsd:complexType name="USAddress">
  <xsd:sequence>
   <xsd:element name="name"   type="xsd:string"/>
   <xsd:element name="street" type="xsd:string"/>
   <xsd:element name="city"   type="xsd:string"/>
   <xsd:element name="state"  type="xsd:string"/>
   <xsd:element name="zip"    type="xsd:integer"/>
  </xsd:sequence>
  <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
 </xsd:complexType>
</xsd:schema>

The following code will generate the po.xml instance that conforms to the above schema.

XmlTextWriter textWriter = new XmlTextWriter("po.xml", null);
textWriter.Formatting    = Formatting.Indented;
XmlQualifiedName qname   = new XmlQualifiedName("PurchaseOrder",       
                           "http://tempuri.org");
XmlSampleGenerator generator = new XmlSampleGenerator("po.xsd",                qname);
genr.WriteXml(textWriter);

The generated document is shown below.

Po.xml:
<PurchaseOrder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" OrderDate="1900-01-01" xmlns="http://tempuri.org">
  <ShipTo country="US">
    <name>name1</name>
    <street>street1</street>
    <city>city1</city>
    <state>state1</state>
    <zip>1</zip>
  </ShipTo>
  <ShipTo country="US">
    <name>name2</name>
    <street>street2</street>
    <city>city2</city>
    <state>state2</state>
    <zip>-79228162514264337593543950335</zip>
  </ShipTo>
  <BillTo country="US">
    <name>name1</name>
    <street>street1</street>
    <city>city1</city>
    <state>state1</state>
    <zip>1</zip>
  </BillTo>
</PurchaseOrder>

Nuts and Bolts of the XmlSampleGenerator

The XmlSchemaSet class (this replaces the XmlSchemaCollection in .NET Framework 2.0) is used as the in-memory schema cache to load and compile schemas. Once the schemas are loaded into the schema set and compiled, the XML is generated by walking the content model of each element beginning at the root element.

Corresponding to the schema tree that begins at the root element that is specified by the user, an instance tree is generated by walking the Schema Object Model. The nodes in the instance tree are annotated with additional information (for example: whether the element node has a default or a fixed value) that is useful while actually writing the XML using the XmlWriter. The instance tree is created by applying the rules specified below while traversing the content model.

Content Model Generation

  • Mixed content: If mixed = true, then a text node with value text will be generated as the first child of the element.
  • Sequence, Choice, All:
    • Every child of a sequence is generated based on its minOccurs/maxOccurs, and the sequence itself will be repeated based on the min/max of the sequence. The value specified through the MaxThreshold property overrides the maxOccurs specified in the schema.
    • If the choice occurs only once, then the first particle within the choice is generated; otherwise we cyclically iterate over all the children in the choice.
    • The generation of xs:all is similar to that of a sequence, except that the elements are generated in the reverse order, as order does not matter for xs:all.
  • Wildcards: All element wildcards are mapped to an XmlSchemaElement by locating an element in the schema set's GlobalElements table if it satisfies the namespace constraint of the <xs:any>. All attribute wildcards are similarly mapped to an XmlSchemaAttribute by looking up the schema set's GlobalAttributes table.

Value Generation

Simple type values are generated by creating a value generator for the corresponding datatype of the element or attribute. The value generator is aware of any facets declared on the simple type and will generate values that will be valid according to the type's facets. The XmlSampleGenerator supports all facets except the xs:pattern facet.

The tool supports all datatypes defined in W3C XmlSchema Datatypes except xs:ENTITY, xs:ENTITIES and xs:NOTATION.

The following rules are applied while generating values:

  • Values of type xs:string are generated by appending the element name with a numeric counter. The length of such a generated string is modified based on the presence of Length, minLength, and maxLength facets.
  • Values of specific string-derived types like xs:NCName, xs:token, and others, have the type name appended with a numeric counter to differentiate them from xs:string.
  • Numeric values are generated by selecting values between the maximum and minimum values for the corresponding CLR type. Every type has an associated default start value (For example, the start value is 1 for xs:integer and 1900-01-01 for xs:date) that is modified based on the presence of certain facets like minInclusive, minExclusive, and so on.
  • If the element or attribute has a default value, that is used in the generation as the first value.
  • If the element or attribute has a fixed value, only this value is used in the generation.

Working with the XmlSampleGenerator

Now let us generate some sample XML instances from schemas to illustrate that the generated instances will conform to various features in W3C XML Schema, like facet restrictions, abstract types, substitution groups, and wildcards.

Facet Restrictions

Let's change the type of the Zip element from xs:integer to ZipIntType, a restriction of xs:integer with minInclusive and MaxExclusive values; the type of the Street element in USAddress to streetType, a restriction of xs:string with minLength = 13; and the type of the State element in USAddress to stateType, an enumeration of 5 US state codes.

<xsd:simpleType name="zipIntType">
   <xsd:restriction base="xsd:integer">
      <xsd:minInclusive value="101101"/>
      <xsd:maxExclusive value="909909"/>
   </xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="streetType">
   <xsd:restriction base="xsd:string">
      <xsd:minLength value="13"/>
   </xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="stateType">
   <xsd:restriction base="xsd:string">
      <xsd:enumeration value="WA"/>
      <xsd:enumeration value="OR"/>
      <xsd:enumeration value="CA"/>
      <xsd:enumeration value="NY"/>
      <xsd:enumeration value="FL"/>
   </xsd:restriction>
</xsd:simpleType>
<xsd:element name="street" type="tns:streetType"/>
<xsd:element name="state"  type="tns:stateType"/>
<xsd:element name="zip" type="tns:zipIntType"/>

Now the values for the Zip, Street, and State elements will be generated according to the new facet restrictions introduced.

<PurchaseOrder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" OrderDate="1900-01-01" xmlns="http://tempuri.org">
  <ShipTo country="US">
    <name>name1</name>
    <street>street______1</street>
    <city>city1</city>
    <state>WA</state><zip>101101</zip>
  </ShipTo>
  <ShipTo country="US">
    <name>name2</name>
    <street>street______2</street>
    <city>city2</city>
    <state>OR</state><zip>909908</zip>
  </ShipTo>
  <BillTo country="US">
    <name>name1</name>
    <street>street______1</street>
    <city>city1</city>
    <state>WA</state><zip>101101</zip>
  </BillTo>
</PurchaseOrder>

SimpleType Lists

The XML Schema has the concept of a list type, where a list is a sequence of atomic values. This allows for an element or an attribute value to be one or more instances of one atomic type. The tool will generate, by default, 3 values of the itemType for an element or attribute whose type is a list. As mentioned before, the ListLength property on the XmlSampleGenerator can be set so as to change this default value.

To our purchaseOrder schema, let us add an element Items which holds the list of items to be shipped as part of this order.

<xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
   <xsd:element name="ShipTo" type="tns:USAddress" maxOccurs="2"/>
   <xsd:element name="BillTo" type="tns:USAddress"/>
   <xsd:element name="Items" type="tns:ItemsList"/>
  </xsd:sequence>
  <xsd:attribute name="OrderDate" type="xsd:date"/>
 </xsd:complexType>

<xsd:simpleType name="ItemsList">
   <xsd:list>
      <xsd:simpleType>
         <xsd:restriction base="xsd:string">
            <xsd:enumeration value="I001"/>
            <xsd:enumeration value="I002"/>
            <xsd:enumeration value="I003"/>
            <xsd:enumeration value="I004"/>
            <xsd:enumeration value="I005"/>
            <xsd:enumeration value="I006"/>
         </xsd:restriction>
      </xsd:simpleType>   
   </xsd:list>
</xsd:simpleType>

The following XML is generated for the above schema.

<PurchaseOrder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" OrderDate="1900-01-01" xmlns="http://tempuri.org">
  <ShipTo country="US">
    <name>name1</name>
    <street>street______1</street>
    <city>city1</city>
    <state>WA</state>
    <zip>101101</zip>
  </ShipTo>
  <ShipTo country="US">
    <name>name2</name>
    <street>street______2</street>
    <city>city2</city>
    <state>OR</state>
    <zip>909908</zip>
  </ShipTo>
  <BillTo country="US">
    <name>name1</name>
    <street>street______1</street>
    <city>city1</city>
    <state>WA</state>
    <zip>101101</zip>
  </BillTo>
  <Items>I001 I002 I003 </Items>
</PurchaseOrder>

Union Types

The XML Schema provides the xs:union construct to allow for an element or attribute value to be an instance of any one type drawn from a union of multiple atomic or list types. The types that form the union are called the memberTypes of the union. For an element or attribute whose type is a union type, our tool will cyclically iterate over each memberType in the union to generate the values. To illustrate this, let us use the example in the W3C XML Schema Primer of a ZipUnion type for our Zip element.

<xsd:simpleType name="zipUnion">
  <xsd:union memberTypes="tns:stateType tns:zipIntType"/>
</xsd:simpleType>
<xsd:element name="zip" type="tns:zipUnion"/>

Notice how the values of the Zip element alternate between the stateType and the zipIntType in the XML generated below.

<PurchaseOrder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" OrderDate="1900-01-01" xmlns="http://tempuri.org">
  <ShipTo country="US">
    <name>name1</name>
    <street>street______1</street>
    <city>city1</city>
    <state>WA</state>
    <zip>WA</zip>
  </ShipTo>
  <ShipTo country="US">
    <name>name2</name>
    <street>street______2</street>
    <city>city2</city>
    <state>OR</state>
    <zip>101101</zip>
  </ShipTo>
  <BillTo country="US">
    <name>name1</name>
    <street>street______1</street>
    <city>city1</city>
    <state>WA</state>
    <zip>WA</zip>
  </BillTo>
  <Items>I001 I002 I003 </Items>
</PurchaseOrder>

Abstract Types

Abstract types are complex types with abstract="true" defined in the schema, to allow for substitution of more specific types in the XML instance by using the xsi:type attribute. When an element's type is declared to be abstract, the element's content cannot be validated against this type. Hence, the XML Generator will use a derived type of this abstract type (derived by extension or restriction) to generate the content model of the element.

In the purchase order schema above, if we want a common address type with the element's Name, Street, and City, and a more specific type for US addresses where we can constrain the State values and the Zip values, we can define the common address type to be abstract and derive the US address type from this abstract type.

Let us change the type of the shipTo element from USAddress to AddressType, and let AddressType be abstract as shown below.

<xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
   <xsd:element name="ShipTo" type="tns:Address" maxOccurs="2
   <xsd:element name="BillTo" type="tns:Address"/>
  </xsd:sequence>
  <xsd:attribute name="OrderDate" type="xsd:date"/>
 </xsd:complexType>

<xsd:complexType name="Address" abstract="true">
  <xsd:sequence>
   <xsd:element name="name"   type="xsd:string"/>
   <xsd:element name="street" type="tns:streetType"/>
   <xsd:element name="city"   type="xsd:string"/>
  </xsd:sequence>
 </xsd:complexType>

 <xsd:complexType name="USAddress">
  <xsd:complexContent>
   <xsd:extension base="tns:Address">
    <xsd:sequence>
      <xsd:element name="state"  type="tns:stateType"/>
      <xsd:element name="zip"    type="tns:zipUnion"/>
    </xsd:sequence>
    <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
   </xsd:extension>
  </xsd:complexContent>
 </xsd:complexType>

Note the generation of the xsi:type attribute with the correct derived type name in the XML generated below.

<PurchaseOrder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" OrderDate="1900-01-01" xmlns="http://tempuri.org">
  <ShipTo xsi:type="USAddress" country="US">
    <name>name1</name>
    <street>street______1</street>
    <city>city1</city>
    <state>WA</state>
    <zip>WA</zip>
  </ShipTo>
  <ShipTo xsi:type="USAddress" country="US">
    <name>name2</name>
    <street>street______2</street>
    <city>city2</city>
    <state>OR</state>
    <zip>101101</zip>
  </ShipTo>
  <BillTo xsi:type="USAddress" country="US">
    <name>name1</name>
    <street>street______1</street>
    <city>city1</city>
    <state>WA</state>
    <zip>WA</zip>
  </BillTo>
  <Items>I001 I002 I003 </Items>
</PurchaseOrder>

Substitution Groups

Substitution groups provide a mechanism that is provided by the XML Schema to allow substituting an element in the schema with another in the instance. One or more elements can be marked as being substitutable for a global element (also called the head element), which means that members of this substitution group are interchangeable with the head element in a content model. The only requirement is that the members of the substitution group must be of the same type or be in the same type hierarchy as the head element.

While generating content models, if the tool encounters an element which is the head of a substitution group, it will use member elements of the group in place of the original element.

Let us add an element called Details to the PurchaseOrderType to include more details about the order itself. The Details element is abstract, and ShipDetails and BillDetails are member elements of the substitutionGroup headed by the Details element.

<xsd:element name="Details" abstract="true"/>
<xsd:element name="ShipDetails" type="tns:sDetailsType"   
                                substitutionGroup="tns:Details"/>
<xsd:element name="BillDetails" type="tns:bDetailsType" 
                                substitutionGroup="tns:Details"/>

<xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
   <xsd:element name="ShipTo" type="tns:Address" maxOccurs="2"/>
   <xsd:element name="BillTo" type="tns:USAddress"/>
   <xsd:element name="Items" type="tns:ItemsList"/>
   <xsd:element ref="tns:Details" maxOccurs="2"/>
  </xsd:sequence>
  <xsd:attribute name="OrderDate" type="xsd:date"/>
</xsd:complexType>

<xsd:complexType name="sDetailsType">
  <xsd:attribute name="GiftWrap" default="false" type="xsd:boolean"/>
</xsd:complexType>

<xsd:complexType name="bDetailsType">
  <xsd:attribute name="PaymentType" default="Credit" type="xsd:string"/>
</xsd:complexType>

For every occurrence of the Details element, one member of its substitutionGroup is chosen to be generated in the instance.

<PurchaseOrder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" OrderDate="1900-01-01" xmlns="http://tempuri.org">
  <ShipTo xsi:type="USAddress" country="US">
    ...
  </ShipTo>
  <ShipTo xsi:type="USAddress" country="US">
    ...
  </ShipTo>
  <BillTo xsi:type="USAddress" country="US">
    ...
  </BillTo>
  <Items>I001 I002 I003 </Items>
  <ShipDetails GiftWrap="false" />  <BillDetails PaymentType="Credit" />
</PurchaseOrder>

Wildcards

xs:any and xs:anyAttribute are the wildcards provided in the W3C XML Schema to allow elements and attributes from specified namespaces to appear in the XML instance. The namespace and processContents attributes provide some control to the schema author in constraining the occurrence of open content.

The following table lists how elements/attributes will be generated corresponding to the wildcards declared in the schema.

processContents Namespace Generated element Generated Attribute
Skip or lax ##any, ##targetNamespace <any_element> in namespace same as the targetNamespace and whose type is xs:anyType any_Attr in namespace same as the targetNamespace and whose type is xs:anySimpleType
  ##local <any_element> in namespace string.Empty and whose type is xs:anyType any_Attr in namespace string.Empty and whose type is xs:anySimpleType
  ##other <any_element> in namespace "otherNS" and whose type is xs:anyType any_Attr in namespace otherNS and whose type is xs:anySimpleType
  List of namespaces First available element in one of the namespaces in the list First available attribute in one of the namespaces in the list
Strict ##any, ##targetNamespace, ##local, ##other, user-specified list of namespaces First available element in the specified namespace First available attribute in the specified namespace

If a wildcard cannot be matched to an element or attribute in the specified namespace, a comment is generated in the XML to indicate that the instance may not be valid.

If, as an author of the purchase order schema, we would like to allow for occurrence of elements and attributes from other custom namespaces in the instances that we will process, we should add an xs:any and xs:anyAttribute to the purchaseOrderType with namespace = "##other".

<xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
   <xsd:element name="ShipTo" type="tns:Address" maxOccurs="2"/>
   <xsd:element name="BillTo" type="tns:USAddress"/>
   <xsd:element name="Items" type="tns:ItemsList"/>
   <xsd:element ref="tns:Details" maxOccurs="2"/>
   <xsd:any namespace="##local" processContents="lax"/>
  </xsd:sequence>
  <xsd:attribute name="OrderDate" type="xsd:date"/>
  <xsd:anyAttribute namespace="##other" processContents="strict"/>
 </xsd:complexType>

The purchaseOrder schema does not import any other namespace from which an element or attribute can be used to match the namespace of the xs:any and the xs:anyAttribute.

As the processContents of the <xs:any> is lax, we generate an element with local name any_element and namespace otherNS.

As the processContents on the xs:anyAttribute is strict, and no valid attribute can be matched, a comment indicating the same is generated as shown below.

<PurchaseOrder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" OrderDate="1900-01-01" xmlns="http://tempuri.org">
  <!-- Attribute Wild card could not be matched. Generated XML may not be valid. -->
  <ShipTo xsi:type="USAddress" country="US">
    ...
  </ShipTo>
  <ShipTo xsi:type="USAddress" country="US">
    ...
  </ShipTo>
  <BillTo xsi:type="USAddress" country="US">
    ...
  </BillTo>
  <Items>I001 I002 I003 </Items>
  <ShipDetails GiftWrap="false" />
  <BillDetails PaymentType="Credit" />
  <any_element xmlns="otherNS">anyType</any_element>
</PurchaseOrder>

Limitations of the XMLSampleGenerator

  • The W3C XML Schema Identity Constraints (xs:key, xs:keyref, xs:unique) are not supported while generating an instance document.
  • If xs:pattern facets exist on simple types, values generated may not conform to the pattern. Enumerations of the xs:QName type may not work as expected since this requires the prefixes in the schema to be preserved.
  • xs:ENTITY, xs:ENTITIES, and xs:NOTATION types are not supported.
  • xs:base64Binary content is generated only if enumerations exist in the schema for that type.

Conclusion

The XmlSampleGenerator tool has been built by extensively using the System.Xml.Schema API and should prove to be a code sample on how to navigate the Schema Object Model.

It should also provide an introduction to some of the new schema API in the .NET Framework 2.0 Beta 1: TypeCode and Variety properties on XmlSchemaDatatype, for example. The former returns the current type's built-in type, eliminating the need to chain backwards in the type hierarchy, and the latter returns whether the type is atomic, list, or union.

This sample also includes a command line utility, XmlGen.exe, which enables you to perform command line XML generation using the XmlSampleGenerator library. This utility generates an XML file called "Sample.xml" in the current directory.