SQL Hierarchies – SQL Smells

SQL Hierarchy - Hierarchies are a problem in SQL - An SQL smell
SQL Hierarchy – Hierarchies are a problem in SQL – An SQL smell

Phil Factor describes “Storing the hierarchy structure in the same table as the entities that make up the hierarchy”, and hierarchies as an SQL Smell. It is one of the longer titles in his article!

If you are not familiar with this construction, it is when a table refers back to itself using a “one-to-many” relationship.

Hierarchies are everywhere in the real world: product categories, nested geographic areas and organisation structures are all reasons we use hierarchies. Hierarchies are essential in order to model real business requirements, but they are difficult to get right. They are difficult to explain to business people and they can be difficult to specify properly to developers. When you decide you need them, then you should look at them closely.

I go a stage further than Phil Factor and say that all requirements for hierarchies should be reviewed, because they can be tricky.

How does a single table represent a hierarchy?

Product Hierarchy as a Single SQL table
Product Hierarchy as a Single SQL table

The figure shows a single table (called “Product_Category”). Each row can “point at” its parent using a Foreign Key in the “ParentCategoryId” column. In the example: Bikes, Components, Clothing and Accessories are all top-level categories. Mountain Bikes, Road Bikes and Touring Bikes are all sub-categories of Bikes. You can see how the row for Mountain Bikes “points at” Bikes.

This is the construction Phil Factor regards as an “SQL Smell” in his article. He says:

  • This approach “mixes relationships and values” (a philosophical objection), and
  • He states that the “closure table” pattern is more suitable for modelling real-life hierarchies.

I agree with Phil Factor on both counts. I’m not entirely against using the single table hierarchy, because it can be very simple and effective, but we need to recognise its limitations. People use the single table hierarchy because they are seduced by its apparent elegance and because they are not aware of any alternatives.

How the “closure table” represents a hierarchy in two tables

Two ways of modelling a hierarchy in SQL tables - single self-referencing table and closure table
Two ways of modelling a hierarchy in SQL tables

A two table “closure table” design is my (and Phil Factor’s) preferred way of storing hierarchies. In this pattern the information about the categories is held in the Category table and the information about the relationships between categories is held as foreign keys in the Product_Category_Closure table.

I like the “closure table” pattern because it is extremely flexible. There is an excellent technical description of how to use it here.

I am still cautious about the “closure table”, because we need to make sure we understand the requirements for the hierarchy we are designing.

Hierarchies as a Requirements Smell for Analysts

Phil Factor identifies the “Single Table Hierarchy” as an SQL Smell. I would go further and identify Hierarchies in general as a Requirements Smell. We can spot hierarchies at the earliest stages of creating the Conceptual Model of the database.

The issue is not “having Hierarchies” in your model. The issue is that both the “Single Table” and “Closure Table” patterns may seduce us into thinking that we have understood what the Business wants, or what the System needs to do, because we have added a table or two into our model.

Hierarchies are an area where the model of the data structure we use in the system can have a real impact (for good and bad) on what the system is capable of.

  • It is not really reasonable to expect business people to understand the nuances of choosing one Conceptual Model over another.
  • Neither is it right to expect Developers or Database designers to decide the importance of things like sequencing of categories, or rules about “depth”.
  • This is especially a problem if the Analyst has created a Model which contains a single table linked to itself and used that as a short-hand for “we need some sort of hierarchy here”, but has not investigated it or specified what it is.

Consequently, there is a danger here that we can either get into “analysis paralysis”, investigating hierarchies, on the one hand, or creating a database with an over-simplified or over complex solution for the hierarchy on the other.

How an analyst should approach hierarchies

In order to manage the risks of “analysis paralysis” or an inappropriate design, I suggest the following approach:

  • Recognise that the Business People may not fully understand the consequences of what they are agreeing to.
  • Make sure you illustrate the definition of the hierarchy with realistic examples. It won’t be wasted effort. The examples will be good for reflecting back the Requirements to the Business and will be useful for explaining the Requirements to whoever is designing the database. The examples will also be useful as the basis for test data.
  • The Business may review their requirements after they have seen the hierarchy implementation. That makes hierarchies a likely candidate for iterative development whether we want it or not!

Summary

Hierarchies are normal and essential Business Requirements. The “single table hierarchy” pattern may over-simplify Requirements, and even the “Closure Table” pattern may seduce us into specifying a design which we do not fully understand.

Hierarchies are a Requirements Smell. If the business needs hierarchies we should not try to eliminate them. Hierarchies deserve particular attention. They should be recognised as a potential risk in the project plan.

Where next?

“Hierarchies” are a problem with understanding real requirements and then converting a Conceptual Model into a Logical Model. They are an unavoidable Requirements Smell.

In the next article I’m going to look deeper than usual into table definitions. I’m going to consider the SQL Smells and Requirements Smells around – “SQL Constraints – for Referential Integrity and for Column Values”.