There are several concepts in real world, which in order to be modeled properly in a database we will need to use a self referential tables. Such concepts can be anything that looks like a “tree”. Think of employees and managers as an organizational chart, taxonomy systems such as animals or genetics, graphs like travel paths, and links between articles (Wikipedia for instance).
Once we have a self-referential table, we may need to query it to retrieve data that involves traversing the tree-like structure. This is where the WITH RECURSIVE statement in PostgreSQL can be extremely useful.
WITH RECURSIVE statements are composed of two queries: the anchor query and the recursive query. The anchor query serves as the starting point of the recursion, providing the initial rows for the result set. The recursive query builds upon the anchor query by defining how to traverse the tree-like structure and selecting the next set of rows to add to the result set. Together, these two queries enable recursive traversal of hierarchical data.
Let us explore some examples to attempt to understand this concept better.
Organization Chart
Let’s start with the example of an organizational chart. Suppose we have a table called employees
that looks like this:
CREATE TABLE employees ( id SERIAL PRIMARY KEY, name VARCHAR(50) NOT NULL, manager_id INTEGER REFERENCES employees(id) );
The manager_id
column is a foreign key that references the id
column of the same table, allowing us to create a self-referential table.
Let’s add some data here:
-- Insert top-level managers INSERT INTO employees (name, manager_id) VALUES ('Alice', NULL), ('Bob', NULL); -- Insert direct reports of Alice INSERT INTO employees (name, manager_id) VALUES ('Charlie', (SELECT id FROM employees WHERE name = 'Alice')), ('Dave', (SELECT id FROM employees WHERE name = 'Alice')); -- Insert direct reports of Bob INSERT INTO employees (name, manager_id) VALUES ('Eve', (SELECT id FROM employees WHERE name = 'Bob')), ('Frank', (SELECT id FROM employees WHERE name = 'Bob')); -- Insert indirect reports of Alice INSERT INTO employees (name, manager_id) VALUES ('Mike', (SELECT id FROM employees WHERE name = 'Charlie')), ('Pam', (SELECT id FROM employees WHERE name = 'Dave')); -- Insert indirect reports of Bob INSERT INTO employees (name, manager_id) VALUES ('Kelly', (SELECT id FROM employees WHERE name = 'Eve')), ('Ryan', (SELECT id FROM employees WHERE name = 'Frank'));
To retrieve the hierarchy of the employees, we can use the WITH RECURSIVE statement like this:
WITH RECURSIVE org_chart AS ( SELECT id, name, manager_id, 1 AS level FROM employees WHERE manager_id IS NULL -- start with top-level managers UNION ALL SELECT e.id, e.name, e.manager_id, o.level + 1 FROM employees e JOIN org_chart o ON e.manager_id = o.id ) SELECT o.id, o.name, m.name AS manager, o.level FROM org_chart o LEFT JOIN employees m ON o.manager_id = m.id ORDER BY level, id;
In this example, we start with the top-level managers by selecting all employees where the manager_id
is NULL. We also include a level
column that tracks the depth of each employee in the hierarchy (in this case, the top-level managers have a level of 1).
The UNION ALL operator allows us to combine the anchor query and the recursive query. In the recursive query, we join the employees
table with the org_chart
table to retrieve the direct reports of each manager. We also increment the level by 1.
Finally in the SELECT statement we also LEFT JOIN employees to get the manager name.
The result of this query is a table that lists all employees, along with their manager and level in the hierarchy:
id | name | manager | level |
---|---|---|---|
1 | Alice | null | 1 |
2 | Bob | null | 1 |
3 | Charlie | Alice | 2 |
4 | Dave | Alice | 2 |
5 | Eve | Bob | 2 |
6 | Frank | Bob | 2 |
7 | Mike | Charlie | 3 |
8 | Pam | Dave | 3 |
9 | Kelly | Eve | 3 |
10 | Ryan | Frank | 3 |
Taxonomy
Let’s create another example, about terrier taxonomy.
Note that this taxonomy is a bit arbitrary for the shake of our example and you can find relevant info in the American Kennel Club website.
CREATE TABLE terriers ( id SERIAL PRIMARY KEY, name VARCHAR(50), parent_id INT ); -- Level 1 INSERT INTO terriers (name, parent_id) VALUES ('Terriers', NULL); -- Level 2 INSERT INTO terriers (name, parent_id) VALUES ('Large Terriers', (SELECT id FROM terriers WHERE name = 'Terriers')), ('Bull Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')); -- Bull Terriers INSERT INTO terriers (name, parent_id) VALUES ('Miniature Bull Terriers', (SELECT id FROM terriers WHERE name = 'Bull Terriers')), ('Colored Bull Terriers', (SELECT id FROM terriers WHERE name = 'Bull Terriers')), ('White Bull Terriers', (SELECT id FROM terriers WHERE name = 'Bull Terriers')); -- Large Terriers INSERT INTO terriers (name, parent_id) VALUES ('Fox Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')), ('Airedale Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')), ('Bedlington Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')), ('Border Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')), ('Kerry Blue Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')), ('Irish Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')), ('Lakeland Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')), ('Norfolk Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')), ('Norwich Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')), ('Scottish Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')), ('Welsh Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers'));
Here we have a table of terriers, with each row having a name and a parent_id that points to the row’s parent terrier (except for the root terrier, which has a NULL parent_id).
Again we can use the WITH RECURSIVE
statement in PostgreSQL to traverse the tree-like structure of the Terriers table and retrieve data that involves traversing the hierarchy. The WITH RECURSIVE
statement consists, just like in the previous example, of two queries: the anchor query and the recursive query.
The anchor query is the starting point of the recursion and provides the initial rows for the result set. In this case, the anchor query selects the root row of the Terriers table (where parent_id IS NULL
).
The recursive query builds upon the anchor query and defines how to traverse the tree-like structure and select the next set of rows to add to the result set. This part selects all the terriers that have a parent_id
that matches the id
of a terrier that was selected in the previous iteration, and appends their id
to the path
array. This continues recursively until all the terriers have been selected.
WITH RECURSIVE terrier_tree(id, name, parent_id, path, path_string) AS ( SELECT id, name, parent_id, ARRAY[id], name::text FROM terriers WHERE parent_id IS NULL UNION ALL SELECT terriers.id, terriers.name, terriers.parent_id, path || terriers.id, path_string || ' > ' || terriers.name FROM terriers JOIN terrier_tree ON terriers.parent_id = terrier_tree.id ) SELECT name, path, path_string FROM terrier_tree;
We also add, the path_string
column to display the concatenated string of name
values with a >
separator. This allows us to display the hierarchical path of each terrier in a more readable format.
name | path | path_string |
---|---|---|
Terriers | {1} | Terriers |
Bull Terriers | {3} | Bull Terriers |
Large Terriers | {1,2} | Terriers > Large Terriers |
Miniature Bull Terriers | {3,4} | Bull Terriers > Miniature Bull Terriers |
Colored Bull Terriers | {3,5} | Bull Terriers > Colored Bull Terriers |
White Bull Terriers | {3,6} | Bull Terriers > White Bull Terriers |
Fox Terriers | {1,2,7} | Terriers > Large Terriers > Fox Terriers |
Airedale Terriers | {1,2,8} | Terriers > Large Terriers > Airedale Terriers |
Bedlington Terriers | {1,2,9} | Terriers > Large Terriers > Bedlington Terriers |
Border Terriers | {1,2,10} | Terriers > Large Terriers > Border Terriers |
Kerry Blue Terriers | {1,2,11} | Terriers > Large Terriers > Kerry Blue Terriers |
Irish Terriers | {1,2,12} | Terriers > Large Terriers > Irish Terriers |
Lakeland Terriers | {1,2,13} | Terriers > Large Terriers > Lakeland Terriers |
Norfolk Terriers | {1,2,14} | Terriers > Large Terriers > Norfolk Terriers |
Norwich Terriers | {1,2,15} | Terriers > Large Terriers > Norwich Terriers |
Scottish Terriers | {1,2,16} | Terriers > Large Terriers > Scottish Terriers |
Welsh Terriers | {1,2,17} | Terriers > Large Terriers > Welsh Terriers |
Smooth Fox Terriers | {1,2,7,18} | Terriers > Large Terriers > Fox Terriers > Smooth Fox Terriers |
Wire Fox Terriers | {1,2,7,19} | Terriers > Large Terriers > Fox Terriers > Wire Fox Terriers |
Self-referential tables are a powerful tool for modeling hierarchical data in a database, and the WITH RECURSIVE statement in PostgreSQL can be used to traverse these structures and retrieve the data you need. Whether you’re working with an organizational chart, a taxonomy system, or any other type of hierarchical data, self-referential tables and recursive queries can help you gain insights and make better decisions.