Lesson 1: Converting a Table to a Hierarchical Structure
Applies to: SQL Server Azure SQL Database Azure SQL Managed Instance
Customers who have tables using self joins to express hierarchical relationships can convert their tables to a hierarchical structure using this lesson as a guide. It is relatively easy to migrate from this representation to one using hierarchyid. After migration, users will have a compact and easy to understand hierarchical representation, which can be indexed in several ways for efficient queries.
This lesson, examines an existing table, creates a new table containing a hierarchyid column, populates the table with the data from the source table, and then demonstrates three indexing strategies. This lesson contains the following topics:
Prerequisites
To complete this tutorial, you need SQL Server Management Studio, access to a server that's running SQL Server, and an AdventureWorks database.
- Install SQL Server Management Studio.
- Install SQL Server 2017 Developer Edition.
- Download AdventureWorks sample databases.
Instructions for restoring databases in SSMS are here: Restore a database.
Examine the current structure of the employee table
The sample AdventureWorks2022
database contains an Employee table in the HumanResources schema. To avoid changing the original table, this step makes a copy of the Employee table named EmployeeDemo. To simplify the example, you only copy five columns from the original table. Then, you query the HumanResources.EmployeeDemo table to review how the data is structured in a table without using the hierarchyid data type.
Copy the Employee table
- In a Query Editor window, run the following code to copy the table structure and data from the Employee table into a new table named EmployeeDemo. Since the original table already uses hierarchyid, this query essentially flattens the hierarchy to retrieve the manager of the employee. In subsequent parts of this lesson we will be reconstructing this hierarchy.
USE AdventureWorks2022;
GO
if OBJECT_ID('HumanResources.EmployeeDemo') is not null
drop table HumanResources.EmployeeDemo
SELECT emp.BusinessEntityID AS EmployeeID, emp.LoginID,
(SELECT man.BusinessEntityID FROM HumanResources.Employee man
WHERE emp.OrganizationNode.GetAncestor(1)=man.OrganizationNode OR
(emp.OrganizationNode.GetAncestor(1) = 0x AND man.OrganizationNode IS NULL)) AS ManagerID,
emp.JobTitle, emp.HireDate
INTO HumanResources.EmployeeDemo
FROM HumanResources.Employee emp ;
GO
Examine the structure and data of the EmployeeDemo table
This new EmployeeDemo table represents a typical table in an existing database that you might want to migrate to a new structure. In a Query Editor window, run the following code to show how the table uses a self join to display the employee/manager relationships:
SELECT Mgr.EmployeeID AS MgrID, Mgr.LoginID AS Manager, Emp.EmployeeID AS E_ID, Emp.LoginID, Emp.JobTitle FROM HumanResources.EmployeeDemo AS Emp LEFT JOIN HumanResources.EmployeeDemo AS Mgr ON Emp.ManagerID = Mgr.EmployeeID ORDER BY MgrID, E_ID
Here's the result set.
MgrID Manager E_ID LoginID JobTitle NULL NULL 1 adventure-works\ken0 Chief Executive Officer 1 adventure-works\ken0 2 adventure-works\terri0 Vice President of Engineering 1 adventure-works\ken0 16 adventure-works\david0 Marketing Manager 1 adventure-works\ken0 25 adventure-works\james1 Vice President of Production 1 adventure-works\ken0 234 adventure-works\laura1 Chief Financial Officer 1 adventure-works\ken0 263 adventure-works\jean0 Information Services Manager 1 adventure-works\ken0 273 adventure-works\brian3 Vice President of Sales 2 adventure-works\terri0 3 adventure-works\roberto0 Engineering Manager 3 adventure-works\roberto0 4 adventure-works\rob0 Senior Tool Designer ...
The results continue for a total of 290 rows.
Notice that the ORDER BY clause caused the output to list the direct reports of each management level together. For instance, all seven of the direct reports of MgrID 1 (ken0) are listed adjacent to each other. Although not impossible, it is much more difficult to group all those who eventually report to MgrID 1.
Populate a Table with Existing Hierarchical Data
This task creates a new table and populates it with the data in the EmployeeDemo table. This task has the following steps:
- Create a new table that contains a hierarchyid column. This column could replace the existing EmployeeID and ManagerID columns. However, you will retain those columns. This is because existing applications might refer to those columns, and also to help you understand the data after the transfer. The table definition specifies that OrgNode is the primary key, which requires the column to contain unique values. The clustered index on the OrgNode column will store the date in OrgNode sequence.
- Create a temporary table that is used to track how many employees report directly to each manager.
- Populate the new table by using data from the EmployeeDemo table.
To create a new table named NewOrg
In a Query Editor window, run the following code to create a new table named HumanResources.NewOrg:
CREATE TABLE HumanResources.NewOrg ( OrgNode hierarchyid, EmployeeID int, LoginID nvarchar(50), ManagerID int CONSTRAINT PK_NewOrg_OrgNode PRIMARY KEY CLUSTERED (OrgNode) ); GO
Create a temporary table named #Children
Create a temporary table named #Children with a column named Num that will contain the number of children for each node:
CREATE TABLE #Children ( EmployeeID int, ManagerID int, Num int ); GO
Add an index that will significantly speed up the query that populates the NewOrg table:
CREATE CLUSTERED INDEX tmpind ON #Children(ManagerID, EmployeeID); GO
Populate the NewOrg table
Recursive queries forbid subqueries with aggregates. Instead, populate the #Children table with the following code, which uses the ROW_NUMBER() method to populate the Num column:
INSERT #Children (EmployeeID, ManagerID, Num) SELECT EmployeeID, ManagerID, ROW_NUMBER() OVER (PARTITION BY ManagerID ORDER BY ManagerID) FROM HumanResources.EmployeeDemo GO
Review the #Children table. Note how the Num column contains sequential numbers for each manager.
SELECT * FROM #Children ORDER BY ManagerID, Num GO
Here's the result set.
EmployeeID ManagerID Num 1 NULL 1 2 1 1 16 1 2 25 1 3 234 1 4 263 1 5 273 1 6 3 2 1 4 3 1 5 3 2 6 3 3 7 3 4
Populate the NewOrg table. Use the GetRoot and ToString methods to concatenate the Num values into the hierarchyid format, and then update the OrgNode column with the resultant hierarchical values:
WITH paths(path, EmployeeID) AS ( -- This section provides the value for the root of the hierarchy SELECT hierarchyid::GetRoot() AS OrgNode, EmployeeID FROM #Children AS C WHERE ManagerID IS NULL UNION ALL -- This section provides values for all nodes except the root SELECT CAST(p.path.ToString() + CAST(C.Num AS varchar(30)) + '/' AS hierarchyid), C.EmployeeID FROM #Children AS C JOIN paths AS p ON C.ManagerID = P.EmployeeID ) INSERT HumanResources.NewOrg (OrgNode, O.EmployeeID, O.LoginID, O.ManagerID) SELECT P.path, O.EmployeeID, O.LoginID, O.ManagerID FROM HumanResources.EmployeeDemo AS O JOIN Paths AS P ON O.EmployeeID = P.EmployeeID GO
A hierarchyid column is more understandable when you convert it to character format. Review the data in the NewOrg table by executing the following code, which contains two representations of the OrgNode column:
SELECT OrgNode.ToString() AS LogicalNode, * FROM HumanResources.NewOrg ORDER BY LogicalNode; GO
The LogicalNode column converts the hierarchyid column into a more readable text form that represents the hierarchy. In the remaining tasks, you will use the
ToString()
method to show the logical format of the hierarchyid columns.Drop the temporary table, which is no longer needed:
DROP TABLE #Children GO
Optimizing the NewOrg Table
The NewOrd table that you created in the Populating a Table with Existing Hierarchical Data task contains all the employee information, and represents the hierarchical structure by using a hierarchyid data type. This task adds new indexes to support searches on the hierarchyid column.
The hierarchyid column (OrgNode) is the primary key for the NewOrg table. When the table was created, it contained a clustered index named PK_NewOrg_OrgNode to enforce the uniqueness of the OrgNode column. This clustered index also supports a depth-first search of the table.
Create index on NewOrg table for efficient searches
To help queries at the same level in the hierarchy, use the GetLevel method to create a computed column that contains the level in the hierarchy. Then, create a composite index on the level and the Hierarchyid. Run the following code to create the computed column and the breadth-first index:
ALTER TABLE HumanResources.NewOrg ADD H_Level AS OrgNode.GetLevel() ; CREATE UNIQUE INDEX EmpBFInd ON HumanResources.NewOrg(H_Level, OrgNode) ; GO
Create a unique index on the EmployeeID column. This is the traditional singleton lookup of a single employee by EmployeeID number. Run the following code to create an index on EmployeeID:
CREATE UNIQUE INDEX EmpIDs_unq ON HumanResources.NewOrg(EmployeeID) ; GO
Run the following code to retrieve data from the table in the order of each of the three indexes:
SELECT OrgNode.ToString() AS LogicalNode, OrgNode, H_Level, EmployeeID, LoginID FROM HumanResources.NewOrg ORDER BY OrgNode; SELECT OrgNode.ToString() AS LogicalNode, OrgNode, H_Level, EmployeeID, LoginID FROM HumanResources.NewOrg ORDER BY H_Level, OrgNode; SELECT OrgNode.ToString() AS LogicalNode, OrgNode, H_Level, EmployeeID, LoginID FROM HumanResources.NewOrg ORDER BY EmployeeID; GO
Compare the result sets to see how the order is stored in each type of index. Only the first four rows of each output follow.
Here's the result set.
Depth-first index: Employee records are stored adjacent to their manager.
LogicalNode OrgNode H_Level EmployeeID LoginID / 0x 0 1 adventure-works\ken0 /1/ 0x58 1 2 adventure-works\terri0 /1/1/ 0x5AC0 2 3 adventure-works\roberto0 /1/1/1/ 0x5AD6 3 4 adventure-works\rob0 /1/1/2/ 0x5ADA 3 5 adventure-works\gail0 /1/1/3/ 0x5ADE 3 6 adventure-works\jossef0 /1/1/4/ 0x5AE1 3 7 adventure-works\dylan0 /1/1/4/1/ 0x5AE158 4 8 adventure-works\diane1 /1/1/4/2/ 0x5AE168 4 9 adventure-works\gigi0 /1/1/4/3/ 0x5AE178 4 10 adventure-works\michael6 /1/1/5/ 0x5AE3 3 11 adventure-works\ovidiu0
EmployeeID-first index: Rows are stored in EmployeeID sequence.
LogicalNode OrgNode H_Level EmployeeID LoginID / 0x 0 1 adventure-works\ken0 /1/ 0x58 1 2 adventure-works\terri0 /1/1/ 0x5AC0 2 3 adventure-works\roberto0 /1/1/1/ 0x5AD6 3 4 adventure-works\rob0 /1/1/2/ 0x5ADA 3 5 adventure-works\gail0 /1/1/3/ 0x5ADE 3 6 adventure-works\jossef0 /1/1/4/ 0x5AE1 3 7 adventure-works\dylan0 /1/1/4/1/ 0x5AE158 4 8 adventure-works\diane1 /1/1/4/2/ 0x5AE168 4 9 adventure-works\gigi0 /1/1/4/3/ 0x5AE178 4 10 adventure-works\michael6 /1/1/5/ 0x5AE3 3 11 adventure-works\ovidiu0 /1/1/5/1/ 0x5AE358 4 12 adventure-works\thierry0
Note
For diagrams that show the difference between a depth-first index and a breadth-first index, see Hierarchical Data (SQL Server).
Drop the unnecessary columns
The ManagerID column represents the employee/manager relationship, which is now represented by the OrgNode column. If other applications do not need the ManagerID column, consider dropping it by using the following statement:
ALTER TABLE HumanResources.NewOrg DROP COLUMN ManagerID ; GO
The EmployeeID column is also redundant. The OrgNode column uniquely identifies each employee. If other applications do not need the EmployeeID column, consider dropping the index and then the column by using the following code:
DROP INDEX EmpIDs_unq ON HumanResources.NewOrg ; ALTER TABLE HumanResources.NewOrg DROP COLUMN EmployeeID ; GO
Replace the original table with the new table
If your original table contained any additional indexes or constraints, add them to the NewOrg table.
Replace the old EmployeeDemo table with the new table. Run the following code to drop the old table, and then rename the new table with the old name:
DROP TABLE HumanResources.EmployeeDemo ; GO sp_rename 'HumanResources.NewOrg', 'EmployeeDemo' ; GO
Run the following code to examine the final table:
SELECT * FROM HumanResources.EmployeeDemo ;
Next steps
The next article teaches you to create and manage data in a hierarchical table.
Go to the next article to learn more: