Secondary XML Indexes

To enhance search performance, you can create secondary XML indexes. A primary XML index must first exist before you can create secondary indexes. These are the types:

  • PATH secondary XML index

  • VALUE secondary XML index

  • PROPERTY secondary XML index

Following are some guidelines for creating one or more secondary indexes:

  • If your workload uses path expressions significantly on XML columns, the PATH secondary XML index is likely to speed up your workload. The most common case is the use of the exist() method on XML columns in the WHERE clause of Transact-SQL.

  • If your workload retrieves multiple values from individual XML instances by using path expressions, clustering paths within each XML instance in the PROPERTY index may be helpful. This scenario typically occurs in a property bag scenario when properties of an object are fetched and its primary key value is known.

  • If your workload involves querying for values within XML instances without knowing the element or attribute names that contain those values, you may want to create the VALUE index. This typically occurs with descendant axes lookups, such as //author[last-name="Howard"], where <author> elements can occur at any level of the hierarchy. It also occurs in wildcard queries, such as /book [@* = "novel"], where the query looks for <book> elements that have some attribute having the value "novel".

PATH Secondary XML Index

If your queries generally specify path expressions on xml type columns, a PATH secondary index may be able to speed up the search. As described earlier in this topic, the primary index is helpful when you have queries that specify exist() method in the WHERE clause. If you add a PATH secondary index, you may also improve the search performance in such queries.

Although a primary XML index avoids having to shred the XML binary large objects at run time, it may not provide the best performance for queries based on path expressions. Because all rows in the primary XML index corresponding to an XML binary large object are searched sequentially for large XML instances, the sequential search may be slow. In this case, having a secondary index built on the path values and node values in the primary index can significantly speed up the index search. In the PATH secondary index, the path and node values are key columns that allow for more efficient seeks when searching for paths. The query optimizer may use the PATH index for expressions such as those shown in the following:

  • /root/Location which specify only a path

OR

  • /root/Location/@LocationID[.="10"] where both the path and the node value are specified.

The following query shows where the PATH index is helpful:

WITH XMLNAMESPACES ('https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ProductModelDescription' AS "PD")

SELECT CatalogDescription.query('
  /PD:ProductDescription/PD:Summary
') AS Result
FROM Production.ProductModel
WHERE CatalogDescription.exist ('/PD:ProductDescription/@ProductModelID[.="19"]') = 1

In the query, the path expression /PD:ProductDescription/@ProductModelID and value "19" in the exist() method correspond to the key fields of the PATH index. This allows for direct seek in the PATH index and provides better search performance than the sequential search for path values in the primary index.

VALUE Secondary XML Index

If queries are value based, for example, /Root/ProductDescription/@*[. = "Mountain Bike"] or //ProductDescription[@Name = "Mountain Bike"], and the path is not fully specified or it includes a wildcard, you might obtain faster results by building a secondary XML index that is built on node values in the primary XML index.

The key columns of the VALUE index are (node value and path) of the primary XML index. If your workload involves querying for values from XML instances without knowing the element or attribute names that contain the values, a VALUE index may be useful. For example, the following expression will benefit from having a VALUE index:

  • //author[LastName="someName"] where you know the value of the <LastName> element, but the <author> parent can occur anywhere.

  • /book[@* = "someValue"] where the query looks for the <book> element that has some attribute having the value "someValue".

The following query returns BusinessEntityID from the Person table. The WHERE clause specifies a filter that looks for values in the AdditionalContactInfoxml type column. The business entity IDs are returned only if the corresponding additional contact information XML binary large object includes a specific telephone number. Because the <telephoneNumber> element may appear anywhere in the XML, the path expression specifies the descendent-or-self axis.

WITH XMLNAMESPACES (
  'https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ContactInfo' AS CI,
  'https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ContactTypes' AS ACT)

SELECT BusinessEntityID 
FROM   Person.Person
WHERE  AdditionalContactInfo.exist('//ACT:telephoneNumber/ACT:number[.="111-111-1111"]') = 1

In this situation, the search value for <number> is known, but it can appear anywhere in the XML instance as a child of the <telephoneNumber> element. This kind of query might benefit from an index lookup based on a specific value.

PROPERTY Secondary Index

Queries that retrieve one or more values from individual XML instances might benefit from a PROPERTY index. This scenario occurs when you retrieve object properties by using the value() method of the xml type and when the primary key value of the object is known.

The PROPERTY index is built on columns (PK, Path and node value) of the primary XML index where PK is the primary key of the base table.

For example, for product model 19, the following query retrieves the ProductModelID and ProductModelName attribute values using the value() method. Instead of using the primary XML index or the other secondary XML indexes, the PROPERTY index may provide faster execution.

WITH XMLNAMESPACES ('https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ProductModelDescription' AS "PD")

SELECT CatalogDescription.value('(/PD:ProductDescription/@ProductModelID)[1]', 'int') as ModelID,
       CatalogDescription.value('(/PD:ProductDescription/@ProductModelName)[1]', 'varchar(30)') as ModelName        
FROM Production.ProductModel   
WHERE ProductModelID = 19

Except for the differences described later in this topic, creating an XML index on anxml type column is similar to creating an index on a non-xml type column. The following Transact-SQL DDL statements can be used to create and manage XML indexes: