Effects of the type substitution mechanism on static typing (part I)

The XML schema specs define a mechanism that allows for the use of derived types in instance documents. In short, if a schema contains an element ‘E’ of type ‘T’, and if there is a type ‘T1’ derived from ‘T’ then we can set the type of any instance of ‘E’ to ‘T1’ like this
<E xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:type="T1">...</E>
When that element instance is validated against a schema, it is type ‘T1’ and not type ‘T’ that must be used to determine if its value or contents are valid.

If you’re not familiar with this particular functionality you will find a simple (although non-normative) explanation in paragraph 4.3 of the W3C’s XSD primer.

Let’s look at an example. First we’re going to define a schema with the following components

  • A complex type ‘myType’
  • Two complex types ‘mySubType2’ and ‘mySubType2’ derived by extension from ‘myType’
  • A complex type ‘myOtherSubType1’ derived by extension from ‘mySubType1’
  • An element 'root' of type 'myType'

CREATE XML SCHEMA COLLECTION myCollection AS '
<schema xmlns="https://www.w3.org/2001/XMLSchema"
targetNamespace="https://ns"
xmlns:ns="https://ns">

 <complexType name="myType">
<sequence>
<element name="a" type="string"/>
</sequence>
</complexType>

 <element name="root" type="ns:myType"/>

 <complexType name="mySubType1">
<complexContent>
<extension base="ns:myType">
<sequence>
<element name="b" type="integer"/>
</sequence>
<attribute name="a1" type="string"/>
</extension>
</complexContent>
</complexType>

 <complexType name="mySubType2">
<complexContent>
<extension base="ns:myType">
<sequence>
<element name="b" type="boolean"/>
</sequence>
<attribute name="a2" type="string"/>
</extension>
</complexContent>
</complexType>

 <complexType name="myOtherSubType1">
<complexContent>
<extension base="ns:mySubType1">
<sequence>
<element name="c" type="string"/>
</sequence>
</extension>
</complexContent>
</complexType>

</schema>'
go

All four following instances are valid against our schema (note how the last three make use of the type substitution mechanism through the presence of the xsi:type attribute)

<x:root xmlns:x="https://ns">
<a>Data</a>
</x:root>

<x:root xmlns:x="https://ns" xsi:type="x:mySubType1" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" a1="attribute">
<a>Data</a>
<b>1</b>
</x:root>

<x:root xmlns:x="https://ns" xsi:type="x:mySubType2" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" a2="attribute">
<a>Data</a>
<b>true</b>
</x:root>

<x:root xmlns:x="https://ns" xsi:type="x:myOtherSubType1" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" a1="attribute">
<a>Data</a>
<b>1</b>
<c>Data</c>
</x:root>

As you can see the content model of the instances of element ‘root’ is not necessarily the content model defined in complex type ‘myType’. Through the type substitution mechanism, it can be the content model of any of the types derived from ‘myType’.

Now let’s run the following xml query.

DECLARE @var XML(myCollection)
SET @var = '<x:root xmlns:x="https://ns" xsi:type="x:mySubType1" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" a1="attribute">
<a>Data</a>
<b>1</b>
</x:root>'
SELECT @var.query(' declare namespace ns="https://ns"; data((/ns:root/b)[1]) + 1')
go

It fails with the following error message
XQuery [query()]: The argument of '+' must be of a single numeric primitive type or 'https://www.w3.org/2004/07/xpath-datatypes#untypedAtomic'. Found argument of type '(xs:boolean | xs:integer) ?'.

At first glance this query might have looked valid. In our XML instance, element ‘root’ is cast to type ‘mySubType1’, so we could expect path expression data((/ns:root/b)[1]) to return a value of type xs:integer. This reasoning however fails to take into account the static type analysis that occurs when the server compiles an xml query.

When SQL Server 2005 performs the static analysis of our XQuery, it considers all possible uses of the xsi:type attribute in the XML instance. In our case, SQL Server 2005 determines that an instance of element ‘root’ can have one of 4 different content models. If the type substitution mechanism isn’t used element ‘root’ cannot have a child named ‘b’. If ‘root’ is cast to type ‘mySubType1’ or ‘myOtherSubType1’, it will contain an element ‘b’ of type xs:integer. If it is cast to type ‘mySubType2’ it will contain an element ‘b’ of type xs:boolean. Therefore the static type of the expression data((/ns:root/b)[1]) is (xs:boolean | xs:integer) ?. Since the ‘+’ operator cannot be applied to an expression of type (xs:boolean | xs:integer) ? and an integer literal query compilation fails and an error is returned.

Once again you can see that static typing has nothing to do with the actual xml data the query is run on. The static type of an expression is determined by looking at the schema(s).

Since this is such a long post already I will stop here for today. More on that topic in the next few days.

-
Disclaimer:
This posting is provided “AS IS” with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at https://www.microsoft.com/info/cpyright.htm.