Some SQL data types are amazingly flexible. As a consequence people are tempted to put all kinds of data into character columns. In his article on SQL Smells Phil Factor identifies packing lists or complex data into a column as one of the “smells”. To be frank, it stinks! One row should contain one value for each column. That value should mean a single thing. Doing anything else is inviting problems.
Let’s look at how to recognise this particular “smell”, where it comes from, the consequences of allowing designs containing it and how to remove it. I’ll also touch on the limited circumstances when it is acceptable.
Recognising “packing lists into a column”
If you are an analyst or designer, working with the Conceptual Design or Requirements for the database then you will know when you are tempted to do this. My advice is – Don’t do it! The explanation will come later.
On the other hand, this smell can be hard to recognise if it actually gets into the database design. There will be evidence in three places:
There may be signs of “lists” in the database design. The names of affected columns may be plural, or something like “List_of_…”. The column is likely to be defined as a character type.
There will be evidence in the data. This is the easiest place to find the evidence. There will usually be a separator character between the different elements of the list, like “1,2,3,4”. Beware! Fixed length character columns, divided into fields, mimicking an ancient punched card are not entirely unknown.
The code will provide evidence. You will know it when you see it. The code will parse the offending column into separate values based on either a separator value or column positions.
Limitations of the packing approach
This approach is not using a relational database as intended. This will impose limitations on your system.
You will not be able to search on the individual fields without unpacking them.
You won’t be able to update the fields without unpacking them and then re-packing them.
Changes to the implied record structure will mean fundamental changes to the database and the associated code.
All the packing and unpacking will cause poor performance.
Excuses for packing a list into a column
There are two reasons you may be tempted to pack a list in this way:
Another system expects something in this form. In this case you would be better to “do things properly” and do the conversion close to the interface with the other system. That way you limit the effect the other system is having on you.
The only acceptable excuse is that the data in the column is going to be treated as a black box. All your system is doing is storing it.
How to avoid packing a list into a column
There is one main strategy for avoiding packing a list (or other complex data) in a column. You should aim to understand the list or data you are packing into the column. Consider breaking the complex data out into a new entity. Apply the techniques of data modelling or normalization.
A column containing complex data indicates problem with the Conceptual Model of the database. You should review the data model and apply the rules of normalization.
Having dealt with a problem poor data modelling, in the next article I’m going to look at a general problem of data design and an associated SQL Smell – “Using inappropriate data types”.
Have you heard of the “Entity Attribute Value” (EAV) Model or pattern? You may have, even if you don’t recognise the name.
The Entity Attribute Value pattern allows someone to add extra attributes to some entities. You know for sure that you have the EAV if you have entities or tables called:
You can substitute words like: property or feature instead of attribute. That’s right – you have an entity called attribute!
Phil Factor identifies the EAV as a potential “SQL Smell” in this article . I regard the Entity Attribute Value model as a “Requirements Smell” too. There are legitimate uses for EAV, but having an EAV may also indicate a problem. The problem my be with your Conceptual Model, or with the way it has been turned into a Logical Database Design.
My previous article dealt with tables which were very wide (have lots of columns). The EAV has the potential to produce a table (the “attribute_value” table) which is very narrow (typically only 3 or 4 columns) and is very long (with lots of rows).
Good Reasons to use an Entity Attribute Value (EAV) Model
The Entity Attribute Value model is not necessarily wrong. There are legitimate reasons for having an EAV:
Conscious modelling of data abstractions: EAVs can be used to take your requirements to a more abstract level. They are commonly found deep inside of “modelling tools”, CASE tools and other software packages which are intended to be configurable.
Consciously making the Conceptual Model “Extensible”: Allowing “user-defined” (or administrator defined) attributes for things is another legitimate use.
Anticipating very sparse data: EAV is a way of handling lots of NULLable columns. It uses space efficiently, but at the cost of more complex processing.
Bad Reasons to use an Entity Attribute Value (EAV) Model
It’s “cool”! I confess. I have done this. When I first discovered the EAV model, I tried to apply it everywhere. This is not a good idea.
Laziness is a very bad reason for using the EAV model. The argument goes something like: “we’re not sure what attributes the users need, so we will allow them to define their own”. The problem with this approach is that these “user-defined” attributes are hard to validate and process design becomes significantly harder.
Benefits of using an Entity Attribute Value (EAV) Model
Creates opportunities for re-use: Using the EAV model can create opportunities for reusing code. All those user-defined attributes are maintained by the same code. It can work very well with abstract Object-Oriented design.
Can make for very elegant and compact code: Well thought-out EAV code can be compact and elegant. This is one reason why you will find EAV models inside many packages.
Can make the data very compact: Using an EAV model can reduce the number of tables you need and the space the data takes up.
Disadvantages of using an Entity Attribute Value (EAV) Model
Data validation becomes harder: The value on the “Attribute_Value” table tends to be stored in a data-type like “varchar”. This makes validation of the data harder. Of course you can start to add validation yourself, but this is adding complexity.
Code becomes abstract and hard to understand: Code written to use an EAV always has to go through extra steps compared to having the column you want directly on the table.
Data becomes abstract and hard to understand. One solution to this is to add SQL views or a layer of code which transforms the abstract data in the EAV into something closer to what the business users are expecting.
The application may need “seed data” which is almost part of the code. This is what happens in some packages.
The application may require a complex “configuration” process. Again, this is what you find in some packages. You have to select which values
Performance: EAV requires 2 joins to get a value and the attribute name. This has performance implications.
Where the complexity and performance impact come from
We can imagine a simple EAV model where the “Entity” contains a single attribute called “Name”. Retrieving the values of “Name” is straightforward. Using the EAV model we can create user-defined attributes called: “Type, Colour, Length and Width”. We can record values for these attributes for any row in the entity table. It is hard to validate the data.
We can retrieve the value of a user-defined attributes using a JOIN. To get the “name” of attribute will require a second join. This can get messy!
Using the values in the EAV tables to identify rows in the Entity table is possible. This may present a challenge to the designers.
Summary – Does the Entity Attribute Value (EAV) Model smell bad?
I agree with Phil Factor. Think hard about whether you should use the EAV model. Using EAV inappropriately can have bad effects.
The Entity Attribute Value model may indicate the requirements are not understood. That is always a bad thing.
In the next article I’m going to look at another “SQL Smell”. Phil Factor calls this one “Polymorphic Association”.