NULLs Data Model and SQL Smells

Nulls - An SQL Smell?
Nulls – An SQL Smell?

NULLs are a perennial problem. Nobody likes them. They confuse developers and users and many analysts do not really understand them.

The concept of NULL allows us to say that there are things that we do not know.

In his article on SQL Smells, Phil Factor associates several smells with NULLs. In this post I’ll explain how to avoid using NULLs and how to use them properly when they are necessary.

Why do we need NULLs at all? What is the benefit and what is the cost?

NULL has a simple meaning with wide-ranging and surprising consequences. NULL means the value is unknown. And this in turn means that the result of any calculation or concatenation which uses this value must also be unknown!

NULL is a feature of SQL. The benefit of allowing NULL or “three valued logic” (TRUE/FALSE/UNKNOWN) is that it allows a database to record that there are things we do not know. The cost of having them is that any calculation or concatenation which uses this value must also be unknown! This confuses many people.

Reasons for needing NULL

There are many reasons why we might not have data to put into a column. Thinking about why we are considering defining a column as NULLable will encourage us to consider alternatives.

The structural NULL – Permanent sub-types

Customer with Sub-types which may cause NULLs
Customer with Sub-types which may cause NULLs

Sometimes we want to combine two entities into a single “super-type” table. There are attributes of Person will never be used for Business (and vice-versa). These missing values will need NULLs.

The structural NULL – Lifecycle subtypes

Order with Lifecycle sub-types which may cause NULLs
Order with Lifecycle sub-types which may cause NULLs

Something similar can happen if we combine all the steps of a entities lifecycle into a single table. The attributes of the later stages will always be empty (NULL) until that stage is reached. These later steps often contain dates or times.

In both cases involving sub-types it may be possible to splitting the sub-types into separate tables. Consider whether it is worth the effort and make sure you avoid the pitfalls of sub-types in SQL.

Data that will never be there

Attributes and Values for an "Address"
Attributes and Values for an “Address”

Entities like “Address” are frequently modelled with attributes like “AddressLine_”. In many cases there will never be values for the later lines. They will not be mandatory in the user interface, but do they need to be NULL? Consider whether allowing them to default to “spaces” or an empty string, would be better and whether it would have any bad effects.

Things which should never allow NULL

Always decide whether you expect an attribute to have a value. Don’t leave it to chance.

There are some things which should hardly ever allow NULL. This includes all keys and identifiers. Avoid allowing short titles or descriptions to be NULL (For long descriptions allowing NULLs is understandable).

Unavoidable NULLs

Attributes and Values for a "Person" Entity - Sometimes NULL is hard to avoid
Attributes and Values for a “Person” Entity – Sometimes NULL is hard to avoid

There are some attributes where allowing NULL is hard to avoid. Life insurance and pensions companies may need a “date of death” for their customers! Having a column with allows NULL is often the easiest way of handling this.

You should resist the temptation to use “magic dates” or inappropriate data-types in order to avoid allowing NULL. The consequences are far worse than the problem.

Summary

NULLs are a problem, nobody likes them but they are necessary. Many problems with NULLs can be avoided by two rules:

  • Remember that NULL means “unknown value” and this has consequences.
  • Ask “_why_ don’t we have this data?”

In many cases NULLs can be avoided by data modelling – that means the analyst has to do work in the Conceptual or Logical Model.

Where next?

The next article is about another smell: having the same name for different things!

Indexes, Analysts and SQL Smells

SQL Indexes - A Goldilocks problem for analysts or an SQL Smell
SQL Indexes – A Goldilocks problem for analysts or an SQL Smell

Database Indexes are something which a lot of analysts ignore as being “too technical”. This is a pity.

Several SQL Smells in Phil Factor’s article point at possible bad decisions. Thinking in the  Logical Model can improve these decisions.

Choosing the correct indexes is a typical “Goldilocks Problem”: not too few, not too many, just the right number! Bad or inadequate requirements will contribute to designers making bad decisions. Phil Factor describes having the wrong indexes as an SQL Smell!

What is an Index and what does it do for us?

If you are an analyst, you may not know exactly what an index is. In non-technical terms, an index provides a quick way for the database manager to find the rows it needs in a table. There are several sorts of index.

An index is a “thing” in its own right. An index takes up space in the database. Updating an index costs effort. The main benefit of an index is that it makes select or read operations faster.

Types: Unique, Non-Unique, Clustered

There are three main kinds of index: Unique, Non-Unique and Clustered.

Imagine that we have a very simply database consisting of 3 tables:

  • Customer (not shown in the diagram)
  • Order
  • OrderLine
Two tables: Order and OrderLine - Where to put the indexes?
Two tables: Order and OrderLine – Where to put the indexes?

Unique Indexes:

Candidate Unique Indexes on an "Order" SQL table
Candidate Unique Indexes on an “Order” SQL table

In the Order table we can see three columns which might be used to identify the order

If you are not familiar with GUIDs, they are a way of assigning identifiers or “keys”. They are worth finding out about. It would be unusual to expect a human being to type in a GUID. An “OrderNum” (which paradoxically might contain letters!) would be more convenient for the users.

We expect all three: OrderId, GUID and OrderNum, to be unique. Therefore, all three are candidates for Unique Indexes. If an application attempts to create a duplicate value in a column which has a unique index, then the database manager will raise an error and reject the transaction.

Non-Unique Indexes for Foreign Keys

Candidate Indexes on an "OrderLine" SQL table
Candidate Indexes on an “OrderLine” SQL table

In the OrderLine table you can see two columns which identify things in other tables: OrderId and ProductId. These are Foreign Keys. In this case we cannot say they are unique, but they are candidates for non-unique indexes.

We could also have used OrderNum or OrderGUID as Foreign Keys into Order.

It is good practice for the rows in the OrderLine table to have a unique identifier. There are two common ways of doing this.

  • We can assign an OrderLineId (which is unique across the whole table) or
  • we can use the combination of OrderId and OrderLineNum which together would identify a row.

In this example, both OrderLineId and the OrderId and OrderLineNum combination are (seperate) candidate unique indexes.

Clustered Indexes

The order of the rows in an SQL table is specified by the Clustered Index. Each table can have only one clustered index. The clustered index must be unique.

People often make the “primary key” the clustered index, but it is worth considering other options. In the example, OrderLines can be added to an order after it has been created.

Using the OrderId, OrderLineNum index as the clustered index would make the database store all the “lines” for one order together (whenever they were added to the order). That may be more efficient for retrieval. Phil Factor identifies two smells with the choice of clustered indexes.

Non-Unique Indexes for Searching

Candidate Indexes on a "Customer" SQL table - A Non-Unique index on Name would help searches
Candidate Indexes on a “Customer” SQL table – A Non-Unique index on Name would help searches

Columns which will be used for searching should be considered candidates for a non-unique index.

The role of Analysts in choosing Indexes and Index types

Indexes are usually specified in the “Physical Model”. The analyst can help the database designer make the right decisions, by applying a little thought. The analyst should not try to pre-empt the designers decisions. They should aim to assist by identifying relevant “candidates”.

"<yoastmark

Summary

Indexes enforce business rules like uniqueness in an SQL database. They influence database performance. Considering candidate indexes in the Logical Model and even the Conceptual Model will help database designers make better decisions.

Where next?

The next article is about the smell of nothing, or “Nulls”. Nulls present problems for developers and database designers.

Simple ideas are the best – BA toolkit?

Today I stumbled across a blog post from a year ago where Ron Healy suggests creating a “Temporary Whiteboard” to carry about. The “whiteboard” is created by laminating a sheet of white paper! Sometimes simple ideas really are the best. You can’t get much simpler than that!

Ron suggests having parallel lines on one side which can be used for:

  • multifunction swimlane
  • activity diagram
  • sequence diagram
  • class diagram
  • hierarchical chart
  • system & subsystem diagram

This is such a good idea. As I commented on Ron’s blog – I’m a Business Analyst and I have been been using a daily plan made the same way for years and it didn’t occur to me to extend the idea.

Simple ideas - Laminated Daily Plans
Simple ideas – Laminated Daily Plans

I have two “plans” which I use which you can see in the picture. There is an A5 size one which is punched with two holes so I can have it at the front of an A4 ring binder and an A6 size one which I keep at the front of my Time Manager.

The layout of my plan is based on the forms I used to use in my Time Manager, but I like to remind myself of the “Elephant” and “Frog” tasks I am dealing with at the moment.

Once way in which I differ from Ron is that he uses dry wipe markers. On the other hand, I use a permanent marker and then wipe the plan down with alcohol at the end of the day when I am planning what to do the next day.

More simple ideas: What do you keep in your toolkit?

“Simple ideas” got me thinking about the Business Analysis or BA “toolkit” I carry around with me. For at least some of my working life I needed to use public transport, and that encourages you to carry the minimum you need.

My toolkit contains:

  • Laptop
  • Laminated plan
  • Diary
  • A4 ring binder to contain the current notes
  • Pack of Post-It notes
  • Pack of file cards
  • Foolscap or A4 wallet folder to hold loose bits together
  • A4 spiral bound notebook
  • Whiteboard Markers

What do you keep in your toolkit?

Bug Hunting for Analysts

Do you deal with problems in your working life? If you work in IT (Information Technology), then I am almost certain that you do!

“Problem Determination” is the art of identifying where the cause of a problem can be found in a system, so that it can be fixed. I like to call it “bug hunting”. Having a reputation for being able to track down the causes of problems will make you more valued. Being able to perform Problem Determination efficiently, or manage others doing Problem Determination will enhance your reputation.

I’ve written a course which teaches you a simple 8-step framework which can be used to manage the Problem Determination process, and techniques which will enable you to isolate problems efficiently. As you complete the sections what you have learned is reinforced by a case study and you can check your understanding using quizzes.

Problems are inevitable. Dealing with problems can be stressful. Take this course and you will learn how to track down problems efficiently and reduce that stress.

Whether you work on a help-desk, or as a developer, analyst or manager the skills you learn from this course will make you more effective. Problem Determination is a powerful intellectual skill which anyone with an analytical and practical mind can learn. It is a skill which once learned can be carried with you and applied to different systems and in different industries.

I teach using a mixture of lectures  and exercises. You will learn by seeing, hearing and doing. With each step you build your knowledge using what you have learned already. An on-line course sets you free to work at your own pace and to review and revisit earlier material, even after you have completed the course.

If I’ve got your interest, then I’ve included links so you can purchase the course at a substantial discount. Go on, have a look now! It’s all supported by an 30 day, no-questions-asked, money-back guarantee too.

Bug Hunting - 50% Off!
Bug Hunting – 50% Off!

EnrolButton

50% Off! $10

Why bother learning SQL?

Why would anyone want to learn a 35 year old programming language? Especially if the language we are talking about was originally intended to be written in ugly CAPITAL LETTERS and moreover tries to make you think in a funny way?

If the language in question is SQL (Structured Query Language), then I think almost anyone who is involved in IT systems should at least consider investing a little time gaining a basic understanding, because SQL databases are at the heart of many commercial systems.

Understanding even a little about what SQL is and how it works and the terminology it uses will enable you to communicate much more effectively with the other members of the development team. Even a little knowledge will give you an appreciation of what SQL databases are good for, what their limitations are and how they compare with other technologies. It doesn’t really matter, whether you start from a programming background or not, because at its heart SQL is really rather simple.

A little while ago, I wrote a course which teaches these basics. It teaches the principles of Relational Databases and the SQL language. Although it is intended for “Analysts”, what it teaches will be useful to programmers or those who are just interested in how systems work.

This course is suitable for a novice. I start by explaining how Relational Databases relate to your experience in the everyday world. When you have completed the course and exercises you will have created a database containing several related tables, updated the data they contain and written many queries which extract and summarise that data.

I teach using a mixture of lectures, demonstrations and exercises. You will learn by seeing, hearing and doing. With each step you build your knowledge using what you have learned already. An on-line course sets you free to work at your own pace and to review and revisit earlier material, even after you have completed the course.
If I’ve got your interest, then I’ve included links so you can purchase the course at a substantial discount. Go on, have a look now! It’s all supported by an 30 day, no-questions-asked, money-back guarantee too.

SQL and Database for Analysts - 75% Off
SQL and Database for Analysts – 75% Off

EnrolButton

75% Off! $12