With Recursive in PostgreSQL

There are several concepts in real world, which in order to be modeled properly in a database we will need to use a self referential tables. Such concepts can be anything that looks like a “tree”. Think of employees and managers as an organizational chart, taxonomy systems such as animals or genetics, graphs like travel paths, and links between articles (Wikipedia for instance).

Once we have a self-referential table, we may need to query it to retrieve data that involves traversing the tree-like structure. This is where the WITH RECURSIVE statement in PostgreSQL can be extremely useful.
WITH RECURSIVE statements are composed of two queries: the anchor query and the recursive query. The anchor query serves as the starting point of the recursion, providing the initial rows for the result set. The recursive query builds upon the anchor query by defining how to traverse the tree-like structure and selecting the next set of rows to add to the result set. Together, these two queries enable recursive traversal of hierarchical data.

Let us explore some examples to attempt to understand this concept better.

Organization Chart

Let’s start with the example of an organizational chart. Suppose we have a table called employees that looks like this:

CREATE TABLE employees (
  id SERIAL PRIMARY KEY,
  name VARCHAR(50) NOT NULL,
  manager_id INTEGER REFERENCES employees(id)
);

The manager_id column is a foreign key that references the id column of the same table, allowing us to create a self-referential table.

Let’s add some data here:

-- Insert top-level managers
INSERT INTO employees (name, manager_id)
VALUES ('Alice', NULL), ('Bob', NULL);

-- Insert direct reports of Alice
INSERT INTO employees (name, manager_id)
VALUES ('Charlie', (SELECT id FROM employees WHERE name = 'Alice')),
       ('Dave', (SELECT id FROM employees WHERE name = 'Alice'));

-- Insert direct reports of Bob
INSERT INTO employees (name, manager_id)
VALUES ('Eve', (SELECT id FROM employees WHERE name = 'Bob')),
       ('Frank', (SELECT id FROM employees WHERE name = 'Bob'));

-- Insert indirect reports of Alice
INSERT INTO employees (name, manager_id)
VALUES ('Mike', (SELECT id FROM employees WHERE name = 'Charlie')),
       ('Pam', (SELECT id FROM employees WHERE name = 'Dave'));

-- Insert indirect reports of Bob
INSERT INTO employees (name, manager_id)
VALUES ('Kelly', (SELECT id FROM employees WHERE name = 'Eve')),
       ('Ryan', (SELECT id FROM employees WHERE name = 'Frank'));

To retrieve the hierarchy of the employees, we can use the WITH RECURSIVE statement like this:

WITH RECURSIVE org_chart AS (
  SELECT id, name, manager_id, 1 AS level
  FROM employees
  WHERE manager_id IS NULL -- start with top-level managers
  UNION ALL
  SELECT e.id, e.name, e.manager_id, o.level + 1
  FROM employees e
  JOIN org_chart o ON e.manager_id = o.id
)
SELECT o.id, o.name, m.name AS manager, o.level
FROM org_chart o
LEFT JOIN employees m ON o.manager_id = m.id
ORDER BY level, id;

In this example, we start with the top-level managers by selecting all employees where the manager_id is NULL. We also include a level column that tracks the depth of each employee in the hierarchy (in this case, the top-level managers have a level of 1).

The UNION ALL operator allows us to combine the anchor query and the recursive query. In the recursive query, we join the employees table with the org_chart table to retrieve the direct reports of each manager. We also increment the level by 1.

Finally in the SELECT statement we also LEFT JOIN employees to get the manager name.

The result of this query is a table that lists all employees, along with their manager and level in the hierarchy:

idnamemanagerlevel
1Alicenull1
2Bobnull1
3CharlieAlice2
4DaveAlice2
5EveBob2
6FrankBob2
7MikeCharlie3
8PamDave3
9KellyEve3
10RyanFrank3
Query results

Taxonomy

Let’s create another example, about terrier taxonomy.
Note that this taxonomy is a bit arbitrary for the shake of our example and you can find relevant info in the American Kennel Club website.

CREATE TABLE terriers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50),
    parent_id INT
);

-- Level 1
INSERT INTO terriers (name, parent_id) VALUES
('Terriers', NULL);

-- Level 2
INSERT INTO terriers (name, parent_id) VALUES
('Large Terriers', (SELECT id FROM terriers WHERE name = 'Terriers')),
('Bull Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers'));

-- Bull Terriers
INSERT INTO terriers (name, parent_id) VALUES
('Miniature Bull Terriers', (SELECT id FROM terriers WHERE name = 'Bull Terriers')),
('Colored Bull Terriers', (SELECT id FROM terriers WHERE name = 'Bull Terriers')),
('White Bull Terriers', (SELECT id FROM terriers WHERE name = 'Bull Terriers'));

-- Large Terriers
INSERT INTO terriers (name, parent_id) VALUES
('Fox Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')),
('Airedale Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')),
('Bedlington Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')),
('Border Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')),
('Kerry Blue Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')),
('Irish Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')),
('Lakeland Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')),
('Norfolk Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')),
('Norwich Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')),
('Scottish Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers')),
('Welsh Terriers', (SELECT id FROM terriers WHERE name = 'Large Terriers'));

Here we have a table of terriers, with each row having a name and a parent_id that points to the row’s parent terrier (except for the root terrier, which has a NULL parent_id).

Again we can use the WITH RECURSIVE statement in PostgreSQL to traverse the tree-like structure of the Terriers table and retrieve data that involves traversing the hierarchy. The WITH RECURSIVE statement consists, just like in the previous example, of two queries: the anchor query and the recursive query.

The anchor query is the starting point of the recursion and provides the initial rows for the result set. In this case, the anchor query selects the root row of the Terriers table (where parent_id IS NULL).

The recursive query builds upon the anchor query and defines how to traverse the tree-like structure and select the next set of rows to add to the result set. This part selects all the terriers that have a parent_id that matches the id of a terrier that was selected in the previous iteration, and appends their id to the path array. This continues recursively until all the terriers have been selected.

WITH RECURSIVE terrier_tree(id, name, parent_id, path, path_string) AS (
    SELECT id, name, parent_id, ARRAY[id], name::text
    FROM terriers
    WHERE parent_id IS NULL
    UNION ALL
    SELECT terriers.id, terriers.name, terriers.parent_id, path || terriers.id,
        path_string || ' > ' || terriers.name
    FROM terriers
    JOIN terrier_tree ON terriers.parent_id = terrier_tree.id
)
SELECT name, path, path_string
FROM terrier_tree;

We also add, the path_string column to display the concatenated string of name values with a > separator. This allows us to display the hierarchical path of each terrier in a more readable format.

namepathpath_string
Terriers{1}Terriers
Bull Terriers{3}Bull Terriers
Large Terriers{1,2}Terriers > Large Terriers
Miniature Bull Terriers{3,4}Bull Terriers > Miniature Bull Terriers
Colored Bull Terriers{3,5}Bull Terriers > Colored Bull Terriers
White Bull Terriers{3,6}Bull Terriers > White Bull Terriers
Fox Terriers{1,2,7}Terriers > Large Terriers > Fox Terriers
Airedale Terriers{1,2,8}Terriers > Large Terriers > Airedale Terriers
Bedlington Terriers{1,2,9}Terriers > Large Terriers > Bedlington Terriers
Border Terriers{1,2,10}Terriers > Large Terriers > Border Terriers
Kerry Blue Terriers{1,2,11}Terriers > Large Terriers > Kerry Blue Terriers
Irish Terriers{1,2,12}Terriers > Large Terriers > Irish Terriers
Lakeland Terriers{1,2,13}Terriers > Large Terriers > Lakeland Terriers
Norfolk Terriers{1,2,14}Terriers > Large Terriers > Norfolk Terriers
Norwich Terriers{1,2,15}Terriers > Large Terriers > Norwich Terriers
Scottish Terriers{1,2,16}Terriers > Large Terriers > Scottish Terriers
Welsh Terriers{1,2,17}Terriers > Large Terriers > Welsh Terriers
Smooth Fox Terriers{1,2,7,18}Terriers > Large Terriers > Fox Terriers > Smooth Fox Terriers
Wire Fox Terriers{1,2,7,19}Terriers > Large Terriers > Fox Terriers > Wire Fox Terriers
Query Result

Self-referential tables are a powerful tool for modeling hierarchical data in a database, and the WITH RECURSIVE statement in PostgreSQL can be used to traverse these structures and retrieve the data you need. Whether you’re working with an organizational chart, a taxonomy system, or any other type of hierarchical data, self-referential tables and recursive queries can help you gain insights and make better decisions.