Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Here are some notes on "SQL Server 2008 XML Indexing" I took while attending an advanced class on SQL Server taught by Adam Machanic (https://sqlblog.com/blogs/adam_machanic/default.aspx).
Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.
XML Indexing
- Engine needs to XML in order to process it
- Solution: XML Indexes – pre-parsed representation of XML data
- Include tages, values, path from the XML instance, turned into a hidden table
- You can actually see the table if connected via admin connection
- XML indexes almost always speed up XML queries, but they do come at a cost
- Use typed XML as much as you can. Optimizer will like it.
- See https://msdn.microsoft.com/en-us/library/ms191497.aspx
XML Indexing - Details
- Types: Primary (base) and Secondary (PATH, VALUE, PROPERTY – Non-clustered on the primary)
- Storage: Increases XML storage cost. Estimate as 3 times the storage requirement of XML instance in base table.
- To create the primary XML index, table must have a clustered primary key
- Stored in the same FG/partition as the base table
- Primary XML index includes column for ordpath, to keep track of hierarchy
- Ordpath also used in SQL Server 2008 for HierachyID data type
- Primary index has 11 columns, including id, nid (node), hid (hierarchy), value, lvalue, PK[n]
- See https://msdn.microsoft.com/en-us/library/bb500237.aspx
XML Indexing – Demo
- Create XML Schema, Create Table with XML column
- Gathering data for products, turning into some nice XML
- Put XML data into a table (no index)
- Query WHERE X.Exist (‘//Name[.=”Adjustable Race”’), look at the plan
- Create an index, run the query again, look at the plan again
- Look at sys.xml_indexes
XML Indexing – Secondary Indexes
- Path index – (‘/a/b[.=”abc”]) - Good for path queries (index on hid, value)
- Value index – (‘//b[.=”abc”]) - Optimized to find the value and we don’t know the path (index on value, hid)
- Property index – .value(‘/a[1]/b[1]’)='abc' - Optimized for name-value pair storage (index on PK[n], hid, value)
- See https://msdn.microsoft.com/en-us/library/bb522562.aspx
XML FullText Searching
- XPath and XQuery expressions – case insensitive
- FTS will index element values only – tags and attributes will be ignored
- FTS can do initial filter, then more exact XML methods
Shred vs. Store Decision
- Performance considerations – relational query is faster, shredding and re-combining is slow
- Data Structure considerations – XML is extremely flexible, but loosely structured
- Data lifecycle considerations – if you input and output XML, don’t shred
- Storing as XML data type – XML data type will check, strip, eliminate extra spaces.
- Storing as [N]VARCHAR(MAX) – If you care about white space or never do any XML operations
- There are cases where you need to store both
- Schema binding – Typed XML will be validated, schema will improve XML query performance
- Schema binding – Uses more space, no wildcard-based queries, schema migration can be difficult
- You can add relational constraints on top of those expressed as XSD
Where to put the XML
- Need a clustering key in the table
- Consider using a separate table if you can’t have a clustering key
- Consider using in-row XML column if smaller than 8,000 bytes
- You can “promote a property” to a relational column for better indexing and relational querying using UDF
Markup choice
- Element-centric - <a><b>val</b></a>
- - Element values considered to be content
- - Verbose, more human readable. Maybe yield simpler qiery plans
- Attribute-centric - <a b=”val”/>
- - Lower storage requirements
Queries
- Context node is a dot (.) - X.Exist(‘/a/b[@attr=”abc”]’)=1 not same as X.Exist(‘/a/b[.@attr=”abc”]’)=1
- Avoid wildcars – Node level - X.Exist(‘//b[@attr=”abc”]’)=1 – Attribute level - X.Exist(‘/a/b/@*[.=”abc”]’)=1
- Singletons – /a/b - /a[1]/b[1] – (/a/b)[1]
- Outer Data binding
- Merging Value Executions
- Favor Exist() over Value()
Shredding
- OPENXML – backward compatible, will be deprecated. Always use sp_xml_removedocument
- Nodes() – newer, current, faster (test both to be sure).
Comments
- Anonymous
January 01, 2003
Thank you for submitting this cool story - Trackback from DotNetShoutout - Anonymous
May 20, 2014
Pingback from XML Large DataBase Is The Best? | Click & Find Answer !