Hierarchy ID

Model Your Data Hierarchies With SQL Server 2008

Kent Tegels

Code download available at:SQLHierarchyID2008_09a.exe(155 KB)

This article is based on a prerelease version of SQL Server 2008 RC0. All information herein is subject to change.

This article discusses:

  • Modeling hierarchical data
  • Creating a bill of materials system
  • Moving to HierarchyID
  • Testing the system
This article uses the following technologies:
SQL Server 2008

Contents

Hierarchical Data
A Bill of Materials Problem
Using Entities to Understand the Problem
A BOM System in SQL Server 2005
Create the Tables
Queries to Validate the Design
Taking Advantage of HierarchyID
Testing the HierarchyID Implementation
Wrapping Up

The manufacturing system behind automobiles; the organization of a country into states, counties, cities, and postal codes; the description of a home entertainment system—what do these things have in common? The simple answer is that each describes a hierarchy.

SQL Server 2008 supports a new data type, HierarchyID, that helps solve some of the problems in modeling and querying hier­archical information. I will introduce you to this data type by discussing a pattern commonly used in manufacturing known as bill of materials (BOM), or bills. Starting with a brief discussion of BOMs, I will illustrate how this kind of data can be modeled. I will also present an implementation of this model in SQL Server 2005. Then I will show you how the HierarchyID data type can be used to implement the model in SQL Server 2008.

Hierarchical Data

Automobiles are amalgamations of many components, such as engines, drivetrains, electronics, and steering. In the United States, our geographic territories are divided into states and are then sub-divided into jurisdictions called counties. Counties are then further subdivided in different ways by different agencies. The United States Census Bureau, for example, composes them from Census Tract Areas. The U.S. Postal Service routes mail delivery by Zone Improvement Plan (ZIP) codes. Geographic information systems (GIS) may aggregate census tracts and ZIP codes together to provide users with a familiar spatial reference for an area.

A recent trip to a local electronics store to evaluate a replacement home entertainment system pointed to a similar sort of hierarchical system—all the combinations of possible components and options left my head spinning! I wondered how such systems could be modeled and implemented in a database system.

The relationship between an automobile and its engine represents a hierarchy: the automobile contains the engine. The relationship is the same for the drivetrain, the electronics, and the steering. The relationship is containment. A similar hierarchy can be observed in the relationship between the different groupings of geographic or census data.

Hierarchies exist everywhere, yet implementing them in the context of a relational database frequently proves to be a challenge. A typical approach is to represent the hierarchy using a parent/child relationship with one or more tables. While this approach certainly works in many cases, it has a few shortcomings. Such solutions must carefully consider how the referential integrity will be maintained. And while querying the depth and breadth of such tables was considerably simplified in SQL Server 2005 with the introduction of recursive common table expressions, writing queries against these types of tables can still be problematic when joins against many tables are required.

A Bill of Materials Problem

A few years ago I was working on a system being developed by a manufacturing company to help their dealers specify the components needed to build center-pivot irrigation systems. The software produced a list of components needed to custom-build the desired pivot (the totality of a center-pivot irrigation system is simply referred to as a pivot within the industry). The required components were determined based on geography, soil type, and the intended crops planted in the areas to be covered as well as the hydrologic and structural considerations of the device itself.

Underpinning the solution would be a SQL Server database. The purpose of the database was to store information about the components available to build the pivot. However, when we generated the specification for manufacturing, we needed to identify those components as BOMs.

Some bills represented a collection of physical parts that would be assembled into a system component. For example, every pivot needed a pump to draw water from a well into the system. That pump might be electrically powered, meaning it needed a transformer and fuse box, too. Or the pump might be fuel powered, meaning it needed a tank, a fuel pump, and hoses to connect the pump to the tank. In either case, the required parts for the pump would be listed in a pump bill.

The bill for a complete pivot would include a collection of other bills. For example, a standardized pivot might consist of a tree of bills for the pump, another tree of bills for the spans of pipe used to deliver water, and bills for any other equipment needed to build that pivot system.

Using Entities to Understand the Problem

The first question you should ask is: what are the entities you need to be concerned with in representing a hierarchical system and what attributes might those entities have? The trick is to not be drawn into the semantics of the individual elements in the underlying system you are trying to model.

For example, consider the case of the PC-based home theater system shown in Figure 1 . It might be easy to say that each item shown here represents an entity. While this is true, it becomes difficult to represent all of the possible home theatre systems that a company might sell. What if some systems use a digital video recorder (DVR) instead of a PC? What about consumers who don't want a radio tuner?

fig01.gif

Figure 1 Components within a Home Theater System (Click the image for a larger view)

Variations like these are precisely why a BOM system works well as a pattern for hierarchical data modeling problems. It allows for a complete system to be composed from many different sub-bills and sub-sub-bills until you eventually get down to individual parts in a tree data structure. At the root of the tree is a complete system represented by a single bill. The first level of branches is generally comprised of bills as well. The list of sub-bills continues down the hierarchy until a bill consists only of a list of physical system parts, and these parts become the leaf nodes on the tree.

So what entities do we have in the system? There are bills that contain only parts (parts lists), and there are bills that contain other bills.

A part has a description and a cost. (Of course, it could have many other attributes, but let's keep things simple for illustration purposes.) A parts list's attributes might include a part and the quantity of that part required. It is important to separate the quantity of a particular part needed from the part information itself. If you fail to do that, you may end up having many duplicate instances of a part that is based only on the difference in the quantity used. Normalization rules suggest that this is an inappropriate design in most cases.

A bill will have a description, a way to associate it with its parent bill, and an associated parts list. Figure 2 represents the set of entities and their relationships.

fig02.gif

Figure 2 Relationship of Entities (Click the image for a larger view)

While relational databases are great at representing most relationships, they are not well suited to working with many-to-many entity models. It is not that the physical tables cannot handle the data; rather, it is difficult to create appropriate referential integrity constraints between the tables. Such relationships also lead to complex queries. This problem is easy to fix, however, when you first address the logical design of the tables. You need to insert a table between the part list table and the bill table. That table will simply hold a reference to a bill and a reference to a parts list. Figure 3 shows an entity relationship diagram including this extra table.

fig03.gif

Figure 3 Entity Relationship Diagram for a Simplified Bill of Materials System
(Click the image for a larger view)

A BOM System in SQL Server 2005

Now that you've seen the theory underpinning a BOM system, it's time to implement one. The procedure is straightforward.

  1. Create the database
  2. Create the tables shown in the entity relationship diagram
  3. Add the constraints required on the appropriate tables
  4. Configure access permissions on the tables as needed
  5. Populate the tables with test data
  6. Write and execute queries against the tables in order to validate the design

Here is the T-SQL code for creating a sample database. You will likely need to change the filename paths for use on your system:

create database [dbBom] on  primary (   
  name = n'dbBom', filename = n'c:\msdnmag\dbBom.mdf', 
  size = 50mb , maxsize = unlimited, filegrowth = 2mb )
log on ( 
  name = n'dbBom_log', filename = n'c:\msdnmag\dbBom_log.ldf', 
  size = 10mb , maxsize = 2048gb , filegrowth = 10%);

After you have successfully executed this statement, make sure to change your database scope to dbBOM, such as with a USE dbBOM statement.

Create the Tables

Following is the executable T-SQL code for creating the four required tables:

create table dbo.part(
  partID smallint not null,
  descr varchar(50) not null,
  cost money not null);
create table dbo.partList(
  partListID int not null,
  partID smallint not null,
  quantity smallint not null);
create table dbo.billPartList(
  billID int not null,
  partListID int not null);  
create table dbo.bill(
  billID int not null,
  parentBillID int null,  
  descr varchar(50) not null);

Note that the only column in these tables allowed to have a null value is the parentBillID in dbo.bill. This null value allows us to distinguish a bill that represents a complete system from other bills.

A key to making this implementation referentially valid is getting the constraints on the tables correct. The tables have different functions in the solution, so they will have different constraints. However, every table in SQL Server should have a primary key associated with it, like so:

alter table dbo.part  
  add constraint pkPart 
  primary key clustered(partID);
alter table dbo.bill
  add constraint pkBill 
  primary key clustered(billID);
alter table dbo.partList  
  add constraint pkPartList 
  primary key clustered(partListID);
alter table dbo.billPartList
  add constraint pkBillPartList 
  primary key clustered(billID,partListID);

Some of these columns should always have a unique value, like the description (descr) of a part or bill. You can also prevent users from creating what amounts to a duplicate part list entry by requiring the combination of a partID and the quantity of that part required:

alter table dbo.part  
  add constraint uqDescr 
  unique(descr);
alter table dbo.bill
  add constraint uqBill 
  unique(descr);
alter table dbo.partList  
  add constraint uqPartList 
  unique(partID,quantity); 

Finally, add the foreign key constraints between the tables. The first constraint is between dbo.part and dbo.partList. This will require that the part being added to the part list first be defined in the dbo.part table. If a part is deleted from dbo.part, you want that change to be reflected in the part list. That is easily done by adding the "on delete cascade" clause to the constraint. Since the partID is marked as an identifier, it cannot be changed and thus no cascading action should be taken:

alter table dbo.partList  
  add constraint fkPartList_Part 
  foreign key(partID) 
  references dbo.part(partID)
  on delete cascade
  on update no action;

The next constraint requires that if a bill is assigned a parentBill­ID, the bill referenced by billID must already exist in the dbo.bill table. The cascading constraints are different. You are prohibited from having a cascading constraint here because SQL Server cannot guarantee that an infinite recursion of actions might take place when you delete or update a parent bill. Since no defaults have been defined, "set default" is not an option either.

alter table dbo.bill  
  add constraint fkBill_Bill 
  foreign key(parentBillID) 
  references dbo.bill(BillID)
  on delete no action
  on update no action; 

The billPartList table requires a couple of foreign key constraints. First, you need to require that the part list being associated with a bill exists in the PartList table. Also, the bill being referenced must be within the dbo.bill table. Since neither the billID from dbo.bill nor the partListID from dbo.partList can be updated, there is no cascade action possible for the constraints. However, if a part or bill is deleted, those changes may affect this table:

alter table dbo.billPartList
  add constraint fkBillPartList_PartList 
  foreign key(partListID) 
  references dbo.partList(partListID)
  on delete cascade
  on update no action;
alter table dbo.billPartList
  add constraint fkBillPartList_Bill 
  foreign key(billID) 
  references dbo.bill(billID)
  on delete cascade
  on update no action;

You can now insert some data into the tables for testing. Given the amount of data involved, I will not show the code for that here. If you download the sample code for this article, simply open and run the script named 01_data.sql to generate this sample data.

Queries to Validate the Design

Writing and executing some simple queries will help test and validate that the design works as expected. My first query uses the new common table expressions syntax to produce a hierarchical list of bills:

with c as (
  select billID,parentBillID,descr,0 as [level]
  from dbo.bill b 
  where b.parentBillID is null
    union all
  select b.billID,b.parentBillID, 
    b.descr,[level] + 1
  from dbo.bill b join c on b.parentBillID = 
    c.billID)
select descr,[level],billID,parentBillID 
  as bill 
from c

In the returned data shown in Figure 4 , there is a single root node (where the parentBillID is null). If you had more than one such bill, you would need to change the query to select that parent by the bill's ID. (For more on common table expressions, see the October 2007 Data Points column at msdn.microsoft.com/magazine/cc163346 .)

fig04.gif

Figure 4 Querying a Hierarchy of Bills (Click the image for a larger view)

The next query builds on the first by listing the each of the parts used for the system bill returned by the first query. This query exercises all of the relationships between the tables:

with c as (
  select billID,parentBillID,descr,0 as [level]
  from dbo.bill b 
  where b.parentBillID is null
    union all
  select b.billID,b.parentBillID,b.descr,[level] + 1
  from dbo.bill b join c on b.parentBillID = c.billID)
select p.partID,p.descr
from c
join dbo.billPartList bpl on c.billID = bpl.billID
left join dbo.partList pl on bpl.partListID = pl.partListID
join dbo.part p on pl.partID = p.partID
group by p.partID,p.descr;

The next query then computes the manufacturer's suggested retail price (MSRP) for the system by summing the cost of the required system parts and multiplying by 2:

with c as (
  select billID,parentBillID,descr,0 as lvl
  from dbo.bill b 
  where b.parentBillID is null
    union all
  select b.billID,b.parentBillID,b.descr,lvl + 1
  from dbo.bill b join c on b.parentBillID = c.billID)
select SUM(p.cost*pl.quantity) * 2.0
  from c
  join dbo.billPartList bpl on c.billID =     bpl.billID
  left join dbo.partList pl on     bpl.partListID= pl.partListID
  join dbo.part p on pl.partID = p.partID

The next query inverts the previous queries. Instead of traversing the data top-down from the topmost parent bill, you can see that it starts with a part, queries for the bill it belongs to, and then works its way up the hierarchy to find all the bills that use the part:

with c as (
  select b.descr,b.billID,b.parentBillID,0     as lvl
  from dbo.partList pl
  left join dbo.billPartList bpl on    pl.partListID = bpl.partListID
  left join dbo.bill b on bpl.billID =     b.billID
  where pl.partID = 19
    union all
  select b.descr,b.billID,b.parentBillID,lvl+1
  from dbo.bill b
  join c on c.parentBillID = b.billID)
select * from c;

Visualizing how the bills inter-relate using the row-wise listing can prove difficult. In such cases, it makes sense to organize the bills into paths, as shown below. By doing so, you can easily visualize the nesting of bills:

with c as (
  select '/'+cast(billID as varchar(49)) as path,BillID
  from dbo.bill b 
  where b.parentBillID is null
    union all 
  select cast(c.path+'/'+CAST(b.billID as varchar(4)) as varchar(50)),    b.billID
  from dbo.bill b join c on b.parentBillID = c.billID)
select c.path+'/',b.descr
from c join dbo.bill b on c.billID = b.billID
order by 1;

You can see the resulting nest in Figure 5 .

fig05.gif

Figure 5 Listing the Nested BOMs (Click the image for a larger view)

The remaining queries demonstrate adding a new bill to the system—in this case, assume that the company is including a 4GB memory stick as part of "Summer Bonus Promotion." The code in Figure 6 inserts a new part, and the bill representing it, into the hierarchy.

Figure 6 Adding a New Part and Bill

begin transaction
  declare @mp int;
  declare @mpl int;
  declare @mb int;
  select @mp = MAX(partID)+10 from dbo.part;
  insert into dbo.part(partID,descr,cost) 
    output inserted.*
    values (@mp,'4GB USB 2.0 Memory Stick',20.0);
  select @mpl = MAX(partListID)+10 from dbo.partList;  
  insert into dbo.partList(partListID,partID,quantity)
    output inserted.*
    values (@mpl,@mp,1);
  select @mb = MAX(billID)+10 from dbo.bill;
  insert into dbo.bill(billID,parentBillID,descr)
    output inserted.* 
    values (@mb,110,'Summer Bonus Package')
  insert into dbo.billPartList(billID,partListID)
    output inserted.*
    values (@mb,@mpl);
commit;
go
select * from dbo.bill where parentBillID = 110;
select * from dbo.part;
go

Now suppose that somebody points out that the "Summer Bonus Package" bill should not be located directly beneath the total system bill, but rather under the Disk Drive module. This is an easy change to make in a relational database:

update dbo.bill set parentBillID = 320 where billID = 507
go
select * from dbo.bill where parentBillID = 320;
go

At the end of the summer promotion period, deleting this new bill is also simple:

begin transaction
  delete from dbo.billPartList where billID=507;
  delete from dbo.bill where billID=507;
  delete from dbo.part where partID=45;
commit;
go
select * from dbo.billPartList;
select * from dbo.bill;

Taking Advantage of HierarchyID

The HierarchyID data type is a CLR-based binary representation designed to store a compact, binary representation of an ordered path. Since it is a built-in data type there is no need to specifically activate the SQL/CLR functionality to use it. HierarchyID is useful whenever you need to represent a nested relationship between values where that relationship can be expressed in an ordered path syntax.

An ordered path looks a bit like a file path, but instead of using directory and file names, numeric values are used. Just like any other parent/child relationship, all ordered paths must be anchored by a root node. In SQL Server 2008, a single character (/) is used to textually represent the root node. Elements with the ordered path are normally represented by integers, but decimal values maybe used as well. The ordered path must terminate with another single character (/).

Ordered paths are not stored in the database in text form, however. Rather, they are mathematically hashed into binary values and those binary values are what is stored in the data pages.

How then do we need to modify our existing database to use HierarchyID? Figure 7 shows some of the ordered paths used by the example bill. Since the parent/child relationship in my sample data is found in the bill table, that is a logical place to start. Rather than modify that table, I will add a new table called bill2.

Figure 7 Paths Used in the Sample Bill of Materials

I will add a column named billPath, of the HierarchyID type. I store the parent/child relationships between bills—using ordered path format—in this column. And while I could at this point remove a column to navigate to a parent row, having this capability will still be helpful in many queries. Rather than store the parent ID as a static data field, however, I will show you how you can use a persisted, computed column based on the hierarchy ID for this type of navigation. The updated table creation code is as follows:

create table dbo.bill2(
  billPath HierarchyID not null,
  billID smallint not null,
  parentBillPath as billPath.GetAncestor(1) persisted,
  descr varchar(50) not null);

If you have not worked with data types based in the Microsoft .NET Framework before, the method invocation in the definition of parentBillPath might seem unusual. The ability to invoke a method defined by the type, optionally passing it one or more parameters, greatly simplifies the code.

Here you can see the HierarchyID type's GetAncestor method. The HierarchyID stores the path in a compact binary form. You simply cannot reference some segment of that binary value and treat that as a billID, nor can you directly fetch some sub-segment of the ordered path for a bill. For this, you can take advantage of the GetAncestor method. The parameter value here means "return the ordered path ending with this node's parent." The parameter value controls how far up in the ordered path the truncation occurs. So here you have the ordered path to the current bill's parent. Figure 8 summarizes the methods available on the HierarchyID data type and some typical use cases for each method.

Figure 8 HierarchyID Methods

Method Returns Parameter Values Purpose
GetAncestor A HierarchyID representing an ordered path for a parent or higher-level element. An integer indicating the number of levels above the level to traverse up in the current ordered path. Finds the parent, grandparent, or higher-level element in the ordered path for this instance.
GetDescendent A HierarchyID representing the path of a child, grandchild, or later child of the current node. Two HierarchyID instances, either or both of which may be null, constraining the potential child returned. Gets a path for inserting a new element at some depth into the current ordered path.
GetLevel A 16-bit integer value representing the total depth of the ordered path. None. Determines whether two ordered paths have the same depth.
GetRoot A HierarchyID for an ordered by path with zero elements. None. Finds the absolute root for any ordered path.
IsDescendantOf 1 if the ordered path passed in as a parameter is a child of the calling instance. A HierarchyID instance. Determines whether a given HierarchyID is a child of another instance.
Parse A HierarchyID instance. The textual representation of an ordered path. Creates a HierarchyID instance from a given path. Implicitly called whenever a HierarchyID instance is set to a string.
GetReparentedValue A HierarchyID representing a complete ordered path when moving the current item from one path to another. The current ordered path as a HierarchyID and the targeted ordered path, also as a HierarchyID. Moves one or more row values from one parent ordered path to another.
ToString The textual representation of a HierarchyID's ordered path. None. Analyzes the ordered path of a HierarchyID.

Next, you need to create a primary key on the new table as well as its foreign key constraint:

alter table dbo.bill2
  add constraint pkBill2
  primary key(billPath);

alter table dbo.bill2
  add constraint fkBill2Parent 
  foreign key (parentBillPath) references dbo.bill2(billPath);

In this case I am using HierarchyID values as both a primary key and foreign key. Remember that a HierarchyID represents a depth-first ordered path. Therefore, values will vary most between peers and less between a parent and child path. When using a HierarchyID value for the clustered index on a table, nodes along the same path will be stored closer together than nodes on other paths. In many cases, this is a good design because you want to query a given path for all of its children. However, if you are more frequently querying for specific nodes, you might want to create the clustered index on a column derived from GetLevel.

This particular implementation introduces a subtle problem. Because I am basing a foreign key constraint on a path rather than a value, I need an existing value in the table to anchor on. All chains of bills will require an anchor and, thankfully, they can all use the same one. The HierarchyID GetRoot method is actually the perfect way to solve this problem. Consider the following statement:

insert into dbo.bill2(billPath,billID,descr) 
  values (hierarchyID::GetRoot(),0,'All Bills'); 

Here I am creating a bill that represents the highest possible root in any hierarchy by calling the GetRoot method as a static member of the type. Now when I insert first-level nodes, they will still refer to a parent within the current table.

So just how do you migrate the existing bills from dbo.bill to dbo.bill2? All you really need to do is modify the previously shown query that selects the hierarchy path so that it casts that path to a HierarchyID and then inserts the data into dbo.bill2. Here is the T-SQL code:

with c(billID,[path],[descr]) as (
  select b.billID,'/'+CAST(b.billID as varchar(max)),descr
  from dbo.bill b
  where parentBillID is null
    union all
  select b.billID,c.path+'/'+CAST(b.billID as varchar(max)),b.descr
  from dbo.bill b
  join c on b.parentBillID = c.billID)
insert into dbo.bill2(billpath,billID,descr)
select path+'/' as [path],billID,descr 
from c
order by c.path;

Figure 9 shows the contents of dbo.Bill2.

Figure 9 The First 15 Rows of dbo.bill2

You will now also want to update the foreign key relationship with dbo.BillPartList as follows:

alter table dbo.billPartList
drop constraint fkBillPartList_Bill
go
alter table dbo.billPartList
  add constraint fkBillPartList_Bill2 
  foreign key(billPath) 
  references dbo.bill2(billPath)
  on delete cascade
  on update no action;

Testing the HierarchyID Implementation

With the tables created and populated and with constraints in place, I can rewrite the previous test queries to use the new HierarchyID type. The first query produced a hierarchical listing of the bills from the top (bill 110) down by using a recursive common table expression. When I use the HierarchyID data type, the complete linage for any given bill is already represented within that value. For example, one of the bottom-most bills in the hierarchy is bill 405, representing a CPU kit. How would I know what bills make up its lineage? This code will tell me:

select billPath.ToString() from dbo.bill2 where billID=405;

The output from that query is a string with the value "/110/210/310/405." Now the question becomes how, exactly, can I get the bills represented by that string? That is a bit trickier since the HierarchyID type does not offer a method for converting the list of values into a vector. However, you can write your own using either T-SQL or SQL/CLR. Here is the code for doing this using a T-SQL table-valued function:

create function dbo.Vectorize(@i hierarchyID)
returns @t table (position int identity(1,1) 
  not null,nodeValue int not null)
as begin
  declare @list varchar(max) = @i.ToString();
  declare @delimit int;
  set @list = substring(@list,2,len(@list)-1);
  while len(@list) > 1 begin
    set @delimit = charindex('/',@list);
    insert into @t values (cast(substring(@list,1,@delimit-1) as int));
    set @list = substring(@list,@delimit+1,len(@list));
  end;
  return;
end;

Now you have a simple way of examining the linage of any bill in the system with a single, non-recursive query:

declare @anyBill int = 405;
select b1.billPath,b2.billID,b2.descr from dbo.bill2 b1
  cross apply dbo.vectorize(b1.billPath) p  
  join dbo.bill2 b2 on b2.billID = p.nodeValue
  where b1.billID = @anyBill;

What about finding the parts used by a given bill? Unlike the prior implementation where I need to recursively query the list of bills, this query can take advantage of a HierarchyID's ability to determine whether one ordered path is a descendent of another by using the IsDescendantOf method:

declare @anyBill int = 110;
declare @sourceBillPath hierarchyID;
select @sourceBillPath = billPath from dbo.bill2 where billID = @anyBill;
select p.partID as 'partID', 
  p.descr 'partName' 
  from dbo.bill2 b2     
  left join dbo.billPartList bpl on b2.billID = bpl.billID
  left join dbo.partList pl on bpl.partListID = pl.partListID
  left join dbo.part p on pl.partID =p.partID
  where b2.billPath.IsDescendantOf(@sourceBillPath)=1
  and p.partID is not null
  order by p.partid;

Calculating the MSRP of the whole system or any sub-bill can be done with a few modifications to the technique that I showed you previously:

declare @anyBill int = 110;
declare @sourceBillPath hierarchyID;
select @sourceBillPath = billPath 
from dbo.bill2 where billID = @anyBill;
  select SUM(p.cost * pl.quantity)*2.0
  from dbo.bill2 b2     
  left join dbo.billPartList bpl on b2.billID = bpl.billID
  left join dbo.partList pl on bpl.partListID = pl.partListID
  left join dbo.part p on pl.partID =p.partID
  where b2.billPath.IsDescendantOf(@sourceBillPath)=1;

Determining which bills use a particular part can be queried using the GetAncestor method. That method allows you to query up a hierarchy. Here I need to use recursion to traverse upward through the list of bills:

declare @partID int = 20;
with c(billPath,descr) as (
  select b.billPath,b.descr
  from dbo.part p
  join dbo.partList pl on p.partID = pl.partID
  join dbo.billPartList bpl on pl.partListID = bpl.partListID
  join dbo.bill2 b on bpl.billID = b.billID
  where p.partID = @partID
    union all
  select b.billPath,b.descr
  from dbo.bill2 b
  join c on b.billPath = c.billPath.GetAncestor(1))
select distinct descr,billPath
from c
where billPath <> hierarchyID::GetRoot()
order by billPath;

Figure 6 added a new part and bill. No changes are needed in the code to insert the part or to create the part list. Creating a bill, however, does provide the opportunity to use the GetDescendant and GetReparentedValue methods of HierarchyID. I can start the process with the following code excerpt:

begin  
  begin transaction
  declare @mp int;
  declare @mpl int;
  select @mp = MAX(partID)+10 from dbo.part;
  insert into dbo.part(partID,descr,cost) 
    output inserted.*
    values (@mp,'4GB USB 2.0 Memory Stick',20.0);
  select @mpl = MAX(partListID)+10 from dbo.partList;  
  insert into dbo.partList(partListID,partID,quantity)
    output inserted.*
    values (@mpl,@mp,1);

With the part and parts list created, I can now create the related bill. The idea is to insert the new bill into the hierarchy to the right of all of the other bills making up the Super X100 Home Theatre System. To do that I need to know two things: the path for parent bill and the path for its right-most child.

Getting the path for the parent bill is a simple query. Finding its right-most child requires using the IsDescendantOf and GetLevel methods, like this:

declare @root hierarchyID;
declare @newBillPath hierarchyID;
declare @newBillID int;

select @root = billPath
  from dbo.bill2 
  where billID = 110;

select @newBillPath = max(billPath) 
  from dbo.bill2 
  where billPath.IsDescendantOf(@root) =1 
  and billPath.GetLevel() = @root.GetLevel()+1;

I am looking for all the root node's immediate descendants. Since paths are ordered by value, the bill with the maximum value for the HierarchyID will be the right-hand-most instance. Knowing the right-hand-most peer node, I now need to create a new path adjacent to it. This is where GetDescendant comes into use. The parameters to this method control where the new path is created. When I leave the second parameter in the list null here, I am saying "create the node to the right of the first parameter:"

select @newBillPath =@root.GetDescendant(@newBillPath,null);   

Now I have a path for the new bill, and that path will have a value for newBillID, so I just need to extract that. The table-valued function added earlier can help:

select @newBillID = nodeValue 
  from dbo.Vectorize(@newBillPath) 
  where position = @newBillPath.GetLevel();

Finally, I can finish this transaction by inserting the new bill into the dbo.bill2 table and inserting the billID and the partListID into the dbo.billPartList table:

insert into dbo.bill2
  output inserted.*
  values (@newBillPath,@newBillID,'Summer Bonus Package');

insert into dbo.billPartList(billID,partListID)
  output inserted.*
  values (@newBillID,@mpl);

commit;
end;

To move the Summer Bonus bill from being a direct child of the Super X100 Home Theatre System to being a child of the Disk Drive Module is not especially difficult provided that you do not want to change the billID (see Figure 10 ). The critical thing to remember about using GetReparentedValue is that calling it does not in and of itself affect any changes to values persisted in a table. Instead, you need to use an update to affect that change.

Figure 10 Moving a Bill

begin
begin transaction
declare @oldPath hierarchyID,@newRoot hierarchyID;
declare @newPath hierarchyID;
declare @billIDToMove int = 251;
declare @billIDToMoveTo int = 320;
select @oldPath = billPath
  from dbo.bill2 where BillID = @billIDToMove;
select @newRoot = b.billPath
  from dbo.bill2 b where billID = @billIDToMoveTo
select @newPath = @oldPath.GetReparentedValue(
  @oldPath.GetAncestor(1),@newRoot);
update dbo.bill2
  set billPath = @newPath
  output inserted.*
  where billID = 251;
commit;
end;

Deleting a part and its matchingID bills can be either a fairly simple process or it can require substantial effort. Much depends on whether the bill being deleted is the parent of another bill. If not, as in the case of the Summer Bonus bill, you can just delete the bill and the part:

begin
declare @partToDelete int = 45;
begin transaction
delete from dbo.bill2 where billID in
  (  select billID from dbo.billPartList bpl
  join dbo.partList pl on bpl.partListID = pl.partListID
  join dbo.part p on pl.partID = p.partID
  where p.partID = @partToDelete);
delete from dbo.part where partID = @partToDelete;    
commit;
end;

Deleting a bill that is the parent of one or more other bills in a design like the one presented here requires more complex code. The issue is that I have a foreign key constraint which requires each bill to have an immediate parent. If I try to delete a parent bill, an error is raised. To fix this, I must first move any children to a bill that will survive. To do that, I will have to temporarily drop the foreign key constraint and compute paths for the new nodes using the GetReparentedValue method.

Wrapping Up

The new HierarchyID type in SQL Server 2008 provides compact data types with the ability to store and work with node-identifying ordered paths. Almost any system that relies on a hierarchical data model can be implemented using this new data type.

The primary benefits are that you can reduce amount of data physically stored on disk and write less complex queries. Note that you can still take advantage of the native SQL Server support for cascading referential integrity constraints with this type as well. The key point to remember is that unlike traditional approaches to modeling a hierarchy using scalar values for a child's ID and a parent's ID, with HierarchyID, each node will contain its complete path as its identity.

Kent Tegels is the Data Curriculum Technical Lead at Developmentor. He develops and teaches classes for SQL Server, .NET, and XML.