NULLs Data Model and SQL Smells

Nulls - An SQL Smell?
Nulls – An SQL Smell?

NULLs are a perennial problem. Nobody likes them. They confuse developers and users and many analysts do not really understand them.

The concept of NULL allows us to say that there are things that we do not know.

In his article on SQL Smells, Phil Factor associates several smells with NULLs. In this post I’ll explain how to avoid using NULLs and how to use them properly when they are necessary.

Why do we need NULLs at all? What is the benefit and what is the cost?

NULL has a simple meaning with wide-ranging and surprising consequences. NULL means the value is unknown. And this in turn means that the result of any calculation or concatenation which uses this value must also be unknown!

NULL is a feature of SQL. The benefit of allowing NULL or “three valued logic” (TRUE/FALSE/UNKNOWN) is that it allows a database to record that there are things we do not know. The cost of having them is that any calculation or concatenation which uses this value must also be unknown! This confuses many people.

Reasons for needing NULL

There are many reasons why we might not have data to put into a column. Thinking about why we are considering defining a column as NULLable will encourage us to consider alternatives.

The structural NULL – Permanent sub-types

Customer with Sub-types which may cause NULLs
Customer with Sub-types which may cause NULLs

Sometimes we want to combine two entities into a single “super-type” table. There are attributes of Person will never be used for Business (and vice-versa). These missing values will need NULLs.

The structural NULL – Lifecycle subtypes

Order with Lifecycle sub-types which may cause NULLs
Order with Lifecycle sub-types which may cause NULLs

Something similar can happen if we combine all the steps of a entities lifecycle into a single table. The attributes of the later stages will always be empty (NULL) until that stage is reached. These later steps often contain dates or times.

In both cases involving sub-types it may be possible to splitting the sub-types into separate tables. Consider whether it is worth the effort and make sure you avoid the pitfalls of sub-types in SQL.

Data that will never be there

Attributes and Values for an "Address"
Attributes and Values for an “Address”

Entities like “Address” are frequently modelled with attributes like “AddressLine_”. In many cases there will never be values for the later lines. They will not be mandatory in the user interface, but do they need to be NULL? Consider whether allowing them to default to “spaces” or an empty string, would be better and whether it would have any bad effects.

Things which should never allow NULL

Always decide whether you expect an attribute to have a value. Don’t leave it to chance.

There are some things which should hardly ever allow NULL. This includes all keys and identifiers. Avoid allowing short titles or descriptions to be NULL (For long descriptions allowing NULLs is understandable).

Unavoidable NULLs

Attributes and Values for a "Person" Entity - Sometimes NULL is hard to avoid
Attributes and Values for a “Person” Entity – Sometimes NULL is hard to avoid

There are some attributes where allowing NULL is hard to avoid. Life insurance and pensions companies may need a “date of death” for their customers! Having a column with allows NULL is often the easiest way of handling this.

You should resist the temptation to use “magic dates” or inappropriate data-types in order to avoid allowing NULL. The consequences are far worse than the problem.

Summary

NULLs are a problem, nobody likes them but they are necessary. Many problems with NULLs can be avoided by two rules:

  • Remember that NULL means “unknown value” and this has consequences.
  • Ask “_why_ don’t we have this data?”

In many cases NULLs can be avoided by data modelling – that means the analyst has to do work in the Conceptual or Logical Model.

Where next?

The next article is about another smell: having the same name for different things!

Packing lists into SQL columns – SQL Smells

Packing Lists into SQL Columns creates an SQL Smell
Packing Lists into SQL Columns creates an SQL Smell

Some SQL data types are amazingly flexible. As a consequence people are tempted to put all kinds of data into character columns. In his article on SQL Smells Phil Factor identifies packing lists or complex data into a column as one of the “smells”. To be frank, it stinks! One row should contain one value for each column. That value should mean a single thing. Doing anything else is inviting problems.

Let’s look at how to recognise this particular “smell”, where it comes from, the consequences of allowing designs containing it and how to remove it. I’ll also touch on the limited circumstances when it is acceptable.

Recognising “packing lists into a column”

Examples of lists packed into SQL Columns
Examples of lists packed into SQL Columns

If you are an analyst or designer, working with the Conceptual Design or Requirements for the database then you will know when you are tempted to do this. My advice is – Don’t do it! The explanation will come later.
On the other hand, this smell can be hard to recognise if it actually gets into the database design. There will be evidence in three places:

  • There may be signs of “lists” in the database design. The names of affected columns may be plural, or something like “List_of_…”. The column is likely to be defined as a character type.
  • There will be evidence in the data. This is the easiest place to find the evidence. There will usually be a separator character between the different elements of the list, like “1,2,3,4”. Beware! Fixed length character columns, divided into fields, mimicking an ancient punched card are not entirely unknown.
  • The code will provide evidence. You will know it when you see it. The code will parse the offending column into separate values based on either a separator value or column positions.

Limitations of the packing approach

This approach is not using a relational database as intended. This will impose limitations on your system.

  • You will not be able to search on the individual fields without unpacking them.
  • You won’t be able to update the fields without unpacking them and then re-packing them.
  • Changes to the implied record structure will mean fundamental changes to the database and the associated code.
  • All the packing and unpacking will cause poor performance.

Excuses for packing a list into a column

There are two reasons you may be tempted to pack a list in this way:

  • You think it will somehow be “more efficient”. Take Phil Factor’s word for it. It won’t!
  • Another system expects something in this form. In this case you would be better to “do things properly” and do the conversion close to the interface with the other system. That way you limit the effect the other system is having on you.

The only acceptable excuse is that the data in the column is going to be treated as a black box. All your system is doing is storing it.

How to avoid packing a list into a column

There is one main strategy for avoiding packing a list (or other complex data) in a column. You should aim to understand the list or data you are packing into the column. Consider breaking the complex data out into a new entity. Apply the techniques of data modelling or normalization.

Summary

A column containing complex data indicates problem with the Conceptual Model of the database. You should review the data model and apply the rules of normalization.

Where next?

Having dealt with a problem poor data modelling, in the next article I’m going to look at a general problem of data design and an associated SQL Smell – “Using inappropriate data types”.