Not using Referential Integrity – An SQL Smell!

Not Using Referential Integrity - SQL Smell?
Not Using Referential Integrity – SQL Smell?

In his article on SQL Smells, Phil Factor describes identifies two “smells” associated with Constraints in SQL. He regards not using Referential Integrity as an SQL Smell.

In this post I will explain what a Constraint is, and how one kind of Constraint provides Referential Integrity.

What are CONSTRAINTs and what do they do for us?

A Constraint limits the values which are allowed in a column. The SQL database manager will use constraints to validate data. There are several types of constraint.

Phil Factor thinks that not enforcing Referential Integrity is usually bad (and I agree).

Referential Integrity and the FOREIGN KEY REFERENCES Constraint

The FOREIGN KEY REFERENCES constraint on a column makes it a Foreign Key to another table. The Foreign Key must be present as a Primary Key in the referenced table. This is called Referential Integrity.

In the illustration, it is not possible for the Order table to contain a CustomerID which does not exist in the table Customer.

Enforcing Referential Integrity with REFERENCES
Enforcing Referential Integrity with REFERENCES

This is very powerful. The constraint makes the database manager check every change. Statements which would violate the Foreign Key constraint are rejected.

Performing the validation costs effort.

Phil Factor points out an additional benefit. The database manager may be able to use the Foreign Keys to improve performance of some queries.

Why wouldn’t you use Referential Integrity?

The arguments most frequently used for not implementing referential integrity are:

  • Performance: I have already mentioned the “cost” of checking the constraints. This argument:
    • Prejudice against logic in the database
    • A desire to minimise the “load” on the database manager.
  • Dirty Data: An “upstream” system may provide data which violates the constraint. Accepting this argument may mean importing faulty data, and the associated problems into your system!

“Staging tables” may be a better solution to this problem.

Both of these arguments are valid in some circumstances. There are times alternatives are better. Ask the following questions:

  • How are you going to perform the validation?
  • Which system components are going to perform the validation?
  • Do you expect the alternative solution to perform better?
  • Will the alternative solution be better in some way?

Not having answers to these questions is not really acceptable. Try and make a rational decision, based on numbers.

How Referential Integrity and Requirements interact

The Conceptual Data Model identifies business entities and relationships. Those relationships define the referential integrity requirements.

Not using referential integrity implies that:

  • The system is going to allow invalid data, or
  • The system is going to validate the data in some other way.

Summary

Foreign Key Constraints exclude some invalid data from the system. Orphaned records (eg non-existent Customers) are impossible.

Exceptions to this rule require a rational justification.

Where next?

In the next article I will look at “Check Constraints” which are a different way of ensuring valid data.

Do “Entity Attribute Value” (EAV) Models smell bad?

Does the Entity Attribute Value (EAV) model an SQL smell?
Does the Entity Attribute Value (EAV) model have an SQL smell?

Have you heard of the “Entity Attribute Value” (EAV) Model or pattern? You may have, even if you don’t recognise the name.

The Entity Attribute Value pattern allows someone to add extra attributes to some entities. You know for sure that you have the EAV if you have entities or tables called:

  • …attribute
  • …attribute_value

You can substitute words like: property or feature instead of attribute. That’s right – you have an entity called attribute!

Phil Factor identifies the EAV as a potential “SQL Smell” in this article . I regard the Entity Attribute Value model as a “Requirements Smell” too. There are legitimate uses for EAV, but having an EAV may also indicate a problem. The problem my be with your Conceptual Model, or with the way it has been turned into a Logical Database Design.

My previous article dealt with tables which were very wide (have lots of columns). The EAV has the potential to produce a table (the “attribute_value” table) which is very narrow (typically only 3 or 4 columns) and is very long (with lots of rows).

Good Reasons to use an Entity Attribute Value (EAV) Model

The Entity Attribute Value model is not necessarily wrong. There are legitimate reasons for having an EAV:

  • Conscious modelling of data abstractions: EAVs can be used to take your requirements to a more abstract level. They are commonly found deep inside of “modelling tools”, CASE tools and other software packages which are intended to be configurable.
  • Consciously making the Conceptual Model “Extensible”: Allowing “user-defined” (or administrator defined) attributes for things is another legitimate use.
  • Anticipating very sparse data: EAV is a way of handling lots of NULLable columns. It uses space efficiently, but at the cost of more complex processing.

Bad Reasons to use an Entity Attribute Value (EAV) Model

  • It’s “cool”! I confess. I have done this. When I first discovered the EAV model, I tried to apply it everywhere. This is not a good idea.
  • Laziness is a very bad reason for using the EAV model. The argument goes something like: “we’re not sure what attributes the users need, so we will allow them to define their own”. The problem with this approach is that these “user-defined” attributes are hard to validate and process design becomes significantly harder.

Benefits of using an Entity Attribute Value (EAV) Model

  • Creates opportunities for re-use: Using the EAV model can create opportunities for reusing code. All those user-defined attributes are maintained by the same code. It can work very well with abstract Object-Oriented design.
  • Can make for very elegant and compact code: Well thought-out EAV code can be compact and elegant. This is one reason why you will find EAV models inside many packages.
  • Can make the data very compact: Using an EAV model can reduce the number of tables you need and the space the data takes up.

Disadvantages of using an Entity Attribute Value (EAV) Model

  • Data validation becomes harder: The value on the “Attribute_Value” table tends to be stored in a data-type like “varchar”. This makes validation of the data harder. Of course you can start to add validation yourself, but this is adding complexity.
  • Code becomes abstract and hard to understand: Code written to use an EAV always has to go through extra steps compared to having the column you want directly on the table.
  • Data becomes abstract and hard to understand. One solution to this is to add SQL views or a layer of code which transforms the abstract data in the EAV into something closer to what the business users are expecting.
  • The application may need “seed data” which is almost part of the code. This is what happens in some packages.
  • The application may require a complex “configuration” process. Again, this is what you find in some packages. You have to select which values
  • Performance: EAV requires 2 joins to get a value and the attribute name. This has performance implications.

Where the complexity and performance impact come from

How "user-defined" values are stored in the Entity Attribute Value (EAV) model
How “user-defined” values are stored in the Entity Attribute Value (EAV) model

We can imagine a simple EAV model where the “Entity” contains a single attribute called “Name”. Retrieving the values of “Name” is straightforward. Using the EAV model we can create user-defined attributes called: “Type, Colour, Length and Width”. We can record values for these attributes for any row in the entity table. It is hard to validate the data.

We can retrieve the value of a user-defined attributes using a JOIN. To get the “name” of attribute will require a second join. This can get messy!

SELECT will follow a different path for "user-defined" attributes in the EAV model
SELECT will follow a different path for “user-defined” attributes in the EAV model

Using the values in the EAV tables to identify rows in the Entity table is possible. This may present a challenge to the designers.

Summary – Does the Entity Attribute Value (EAV) Model smell bad?

I agree with Phil Factor. Think hard about whether you should use the EAV model. Using EAV inappropriately can have bad effects.

The Entity Attribute Value model may indicate the requirements are not understood. That is always a bad thing.

Where next?

In the next article I’m going to look at another “SQL Smell”. Phil Factor calls this one “Polymorphic Association”.

 

Process Models: Who is doing what?

Process Models: Who is doing what?

Process Models
Process Model Diagram

Every Business Analyst and Business Consultant should know something about Process Models and Process Modelling. Creating a simple model of what is happening outside the IT System can be a very useful place to start. You may even benefit when there is no IT System at all!

Ways of using Process Models

A Process Model is simply a representation of what the process is doing in the real world. This representation is usually graphical. There are different notations, and a large number of tools to help you draw the pictures.

Trivial Process Model - Documented using a Process Modelling tool
Trivial Process Model – Documented using a Process Modelling tool

Process Models can be used in a number of ways, many of which overlap.

  • A model can be used as a framework to assess the process against some criteria.
  • Models can be used to explore the effect of some change to the process.
  • Models can also be used to show how the physical world and IT interact.
    Using Process Models appropriately can help ensure that any changes are beneficial to the business.

Process Models: “As-Is” and “To-Be”

Process Models can be used to explore changes to a process. The “As-Is” model shows how the process works now, and the “To-Be” model shows how the process will work after the proposed changes. Comparing the two models allows us to demonstrate how the changes will be beneficial to the business.

The changes need not be changes to IT systems. The benefits which can be demonstrated may be the elimination of roles, or reduction in time or the number of steps.

Process Models: How does IT mesh with the business?

Gears - Process Models show how things interact
Gears – Process Models show how things interact

The Swim-lane process models demonstrate how different roles collaborate or people use several different tools or IT systems to perform their work. Imagine the different roles or tools as “gears” and you will understand what I mean. Using swim-lanes helps you to visualise and communicate how the different lanes interact.

Traps you can avoid using Process Models

Where a business process involves activities in the physical world (and not just doing things at a screen) then a process model can help put the IT systems into context. Doing this may prevent you spending time on details which are not important.

Ways Process Models can trap you!

Many process modelling tools allow you to break-down individual steps into smaller pieces. Resist the temptation to break things down too early, or everywhere. If you keep your models at a high level you will reduce the amount of work you have to do, and you will not reduce the value of the models.

Conclusion

Process Models can be used to put the IT system into a wider business context. They can even be used to analyse and rationalise processes where the IT systems do not play a significant role. Process Models are commonly used to demonstrate the claimed benefits of a new way of doing things, the so-called “As-Is” and “To-Be” models.

The keys to success with Process Models are to present the simplest model which is appropriate for your needs and to control the amount of detail.