• To represent organizational hierarchy
• To represent a family tree
• To store BOM (Bills of Material) information which represents collection of sub-components which make up a final component.
Organizations have struggled in past with the representation of tree like structures in the databases, lot of joins lots of complex logic goes into the place.
Until SQL Server 2005, we had to create a self-referencing table to store hierarchical information, which gradually became more difficult and tedious to work with when hierarchical level grew to more than 2 or 3levels. The more levels, the more difficult to query as it required lot of joins, complex recursive logic. Though with the introduction of CTEs (Common Table Expression) in SQL Server 2005, we were not required to write iterative logic anymore to get a node (i.e. a Manager) and all its child nodes (i.e. all subordinates), but the performance of the CTE method is not that good.
Now the question is given a manager (ReportsTo) ID, get all employees who are directly or indirectly reporting to him, let's see how we can do this in different versions of SQL Server.
--SQL Server 2005
--Get all employees who directly/indirectly report to EmpID = 2
WITH EmployeeHierarchy(EmpID, EmpName, Manager, [Level])
AS
(
SELECT emp.EmpID, emp.EmpName, (SELECT EmpName FROM Employees_2000_2005 WHERE EmpID = emp.ReportsTo) , 1 AS [Level]
FROM Employees_2000_2005 AS emp
--Get all employees who directly/indirectly report to EmpID = 2
WHERE emp.EmpID = 2
UNION ALL
SELECT emp.EmpID, emp.EmpName, Parent.EmpName, Level+1
FROM Employees_2000_2005 AS emp
INNER JOIN EmployeeHierarchy AS Parent ON emp.ReportsTo = parent.EmpID
)
SELECT *
FROM EmployeeHierarchy
--Get all employees who directly/indirectly report to EmpID = 4
WITH EmployeeHierarchy(EmpID, EmpName, Manager, [Level])
AS
(
SELECT emp.EmpID, emp.EmpName, (SELECT EmpName FROM Employees_2000_2005 WHERE EmpID = emp.ReportsTo) , 1 AS [Level]
FROM Employees_2000_2005 AS emp
--Get all employees who directly/indirectly report to EmpID = 4
WHERE emp.EmpID = 4
UNION ALL
SELECT emp.EmpID, emp.EmpName, Parent.EmpName, Level+1
FROM Employees_2000_2005 AS emp
INNER JOIN EmployeeHierarchy AS Parent ON emp.ReportsTo = parent.EmpID
)
SELECT *
FROM EmployeeHierarchy
Even though SQL Server 2005, using CTE, simplified the process of retrieving hierarchical data from the SQL Server but still it has performance penalty. Not only this, the major drawback was still existent, I mean it was a nightmare to move child/sub-child records from one parent to another because of some organizational restructuring/realignment. Luckily with HIERARCHYID in SQL Server 2008 we would not be having those problems anymore. So now let's turn our attention to learn about this new system data type in details.
The new HIERARCHYID data type has been introduced to SQL Server 2008.
The following are the methods in the SQL Server 2008 Database Engine to support HierarchyID data type.
GetAncestor()
This method is useful to find the (nth ancestor of the given child node.
Syntax: child.GetAncestor ( n )
GetDescendant()
This method is very useful to get the descendant of a given node. It has a great significance in terms of finding the new descendant position get the descendants etc.
Syntax: parent.GetDescendant ( child1 , child2 )
This function returns one child node that is a descendant of the parent.
1.If parent is NULL, returns NULL.
2.If parent is not NULL, and both child1 and child2 are NULL, returns a child of parent.
3.If parent and child1 are not NULL, and child2 is NULL, returns a child of parent greater than child1.
4.If parent and child2 are not NULL and child1 is NULL, returns a child of parent less than child2.
5.If parent, child1, and child2 are all not NULL, returns a child of parent greater than child1 and less than child2.
6.If child1 or child2 is not NULL but is not a child of parent, an exception is raised.
7.If child1 >= child2, an exception is raised.
GetLevel()
This method is useful to find the Level of the current node.
Syntax: node.GetLevel ( )
This function will return an integer that represents the depth of this node in the current tree.
GetRoot()
This method will return the root of the hierarchy tree and this is a static method if you are using it within CLR.
Syntax: hierarchyid::GetRoot ( )
It will return the data type hierarchyID.
IsDescendant()
This method returns true/false (BIT) if the node is a descendant of the parent.
Syntax: parent.IsDescendant ( child )
Parse()
Parse converts the canonical string representation of a hierarchyid to a hierarchyid value. Parse is called implicitly when a conversion from a string type to hierarchyid occurs. Acts as the opposite of ToString(). Parse() is a static method.
Syntax: hierarchyid::Parse ( input )
Read()
Read reads binary representation of SqlHierarchyId from the passed-in BinaryReader and sets the SqlHierarchyId object to that value. Read cannot be called by using Transact-SQL. Use CAST or CONVERT instead.
Syntax: void Read( BinaryReader r )
Reparent()
This is a very useful method which helps you to reparent a node i.e. suppose if we want to align an existing node to a new parent or any other existing parent then this method is very useful.
Syntax: node.Reparent ( oldRoot, newRoot )
ToString()
This method is useful to get the string representation of the HierarchyID. The method returns a string that is a nvarchar(4000) data type.
Syntax: node.ToString ( )
Write()
Write writes out a binary representation of SqlHierarchyId to the passed-in BinaryWriter. Write cannot be called by using Transact-SQL. Use CAST or CONVERT instead.
Syntax: void Write( BinaryWriter w )
--First I will create a table which uses new data type
CREATE TABLE Employees_Ver2008
(
EmpID INT PRIMARY KEY NOT NULL IDENTITY,
EmpName VARCHAR(255) NOT NULL,
Title VARCHAR(255) NOT NULL,
HierarchicalLevel HIERARCHYID --Note the new data type here
)
GO
--Insert the Root node first in the table
--HIERARCHYID::GetRoot() is static method which returns the root node of a hierarchy
INSERT INTO Employees_Ver2008 (EmpName, Title, HierarchicalLevel)
VALUES ('CTO', 'Chief Technology Officer', HIERARCHYID::GetRoot())
GO
--Let's see the data in the table
SELECT EmpID, EmpName, Title, HierarchicalLevel, HierarchicalLevel.ToString() AS [Position]
FROM Employees_Ver2008
GO
--Insert the first child node of the root node
--Get the root node we wish to insert a descendant of
DECLARE @CTONode HIERARCHYID
SELECT @CTONode = HIERARCHYID::GetRoot() FROM Employees_Ver2008
INSERT INTO Employees_Ver2008 (EmpName, Title, HierarchicalLevel)
VALUES ('Manager_Dev1', 'Managar -Dev', @CTONode.GetDescendant(NULL, NULL))
GO
--Now let's insert the second child node of the root node
--Get the root node we wish to insert a descendant of
DECLARE @CTONode HIERARCHYID
SELECT @CTONode = HIERARCHYID::GetRoot() FROM Employees_Ver2008
-- Determine the last child position
DECLARE @LastChildPosition HIERARCHYID
SELECT @LastChildPosition = MAX(HierarchicalLevel) FROM Employees_Ver2008
WHERE HierarchicalLevel.GetAncestor(1) = @CTONode
INSERT INTO Employees_Ver2008 (EmpName, Title, HierarchicalLevel)
VALUES ('Manager_Test1', 'Manager- Test', @CTONode.GetDescendant(@LastChildPosition, NULL))
GO
--Let's see the data in the table
SELECT EmpID, EmpName, Title, HierarchicalLevel, HierarchicalLevel.ToString() AS [Position] FROM Employees_Ver2008
GO
CREATE PROCEDURE AddEmployee_Ver2008
(@ReportsToID INT, @EmpName VARCHAR(255), @Title VARCHAR(255))
AS
BEGIN
-- Get the root node we wish to insert a descendant of
DECLARE @ReportsToNode HIERARCHYID
SELECT @ReportsToNode = HierarchicalLevel FROM Employees_Ver2008
WHERE EmpID = @ReportsToID
-- Determine the last child position
DECLARE @LastChildPosition HIERARCHYID
SELECT @LastChildPosition = MAX(HierarchicalLevel) FROM Employees_Ver2008
WHERE HierarchicalLevel.GetAncestor(1) = @ReportsToNode
INSERT INTO Employees_Ver2008 (EmpName, Title, HierarchicalLevel)
VALUES (@EmpName, @Title, @ReportsToNode.GetDescendant(@LastChildPosition, NULL))
END
GO
--Let's add remaining nodes of tree using created stored procedure
EXECUTE AddEmployee_Ver2008 2, 'Lead_D1', 'Team Lead- Dev'
and so on...
--Get all employees who directly/indirectly report to EmpID = 4
DECLARE @ReportsToID HIERARCHYID
SELECT @ReportsToID = HierarchicalLevel FROM Employees_Ver2008 WHERE EmpID = 4
SELECT EmpID, EmpName, Title, HierarchicalLevel, HierarchicalLevel.ToString()AS [Position]
FROM Employees_Ver2008
WHERE HierarchicalLevel.IsDescendantOf(@ReportsToID)= 1
Now let's turn our attention from inserting records to retrieve it. The problem statement is:
Get all employees who directly/indirectly report to a particular Employee.
Get all managers of a given employee in all levels in the management chain until the root node.
--Get all employees who directly/indirectly report to EmpID = 4
DECLARE @ReportsToID HIERARCHYID
SELECT @ReportsToID = HierarchicalLevel FROM Employee_2008 WHERE EmpID = 4
SELECT EmpID, EmpName, Title, HierarchicalLevel, HierarchicalLevel.ToString()AS [Position]
FROM Employee_2008
WHERE HierarchicalLevel.IsDescendantOf(@ReportsToID)= 1
--Get all managers of a given employee(EmpID=4) in all levels in
--the management chain until the root node
SELECT @ReportsToID = HierarchicalLevel FROM Employee_2008 WHERE EmpID = 4
SELECT EmpID, EmpName, Title, HierarchicalLevel, HierarchicalLevel.ToString()AS [Position]
FROM Employee_2008
WHERE @ReportsToID.IsDescendantOf(HierarchicalLevel) = 1
Now let's see the last example of HIERARCHYID here. The requirement is Lead_D1 has left the job so all his direct reports now will report to new Lead_D3. To accomplish this task you need to write your script which will look something like this.
--Adding new Dev Lead
EXECUTE AddEmployees_Ver20082, 'Lead_D1','DEV Leader'
--Get the HierarchyID of New DEV Lead
DECLARE @NewDevLead HierarchyID= (SELECT HierarchicalLevel FROM Employees_Ver2008WHERE EmpID = @@IDENTITY)
--Get the HierarchyID of old DEV Lead
DECLARE @OldDevLead HierarchyID= (SELECT HierarchicalLevel FROM Employees_Ver2008WHERE EmpID = 4)
UPDATE Employee_2008
SET HierarchicalLevel = HierarchicalLevel.GetReparentedValue(@OldDevLead, @NewDevLead)
WHERE HierarchicalLevel.IsDescendantOf(@OldDevLead)= 1
--Parent is considered its own descendant, so you need to exclude it
AND HierarchicalLevel <> @OldDevLead
HIERARCHYID data type does not ensure uniqueness; I mean there's nothing stopping you from inserting a new row into the Employee table with the same position within the hierarchy as the other nodes unless you define unique constraint on the column.
The HIERARCHYID column can be indexed in two ways
Depth First Strategy: By default when you create an index on HIERARCHYID data type column, it uses Depth First Strategy. In a depth-first index strategy, rows in a sub tree are stored near each other. For example, all employees that report through a manager are stored near their managers' record.
Breadth First Strategy: A breadth-first strategy stores the rows at each level of the hierarchy together. For example, the records of employees who directly report to the same manager are stored near each other. Also to create a bread first index, the system needs to know the level of each record in table; for that purpose you can use GetLevel function of HIERARCHYID.
1 comment:
Many thanks, this is much more helpful than the MS Help. However, one point - in your last example, should the first line;
EXECUTE AddEmployees_Ver20082, 'Lead_D1','DEV Leader
actually read
EXECUTE AddEmployees_Ver20082, 'Lead_D3','DEV Leader
Doesn't seem to make sense otherwise.
Post a Comment