SQL Naming Conventions prevent SQL Smells

SQL Naming Conventions prevent bad SQL Smells

SQL Naming Conventions prevent bad SQL Smells

Anything written about Information Technology has to mention naming conventions. Phil Factor’s article on SQL Smells describes several smells which can be caused by poor naming of objects in the database. In this article, I am going to concentrate on naming tables and columns.

What makes a good naming convention?

 

I don’t think I can do better than quote Phil Factor in his original article:

“Identifiers should help to make SQL readable as if it were English.”

By “Identifiers” Phil means the names of the objects. The name of the object is to distinguish it from all the other things in the database. It does not need to fully describe the object. It is good if it tells people what the object contains, but it does not have to describe it in detail. There are better ways to do that.

By the way, SQL was originally meant to be called “Structured English Query Language” but the name was altered because of a trademark clash.

Short, meaningful names are good. Nobody will thank you for meaningless names like “T1231237” or “TEMP000”. I’ve actually seen both of those! Similarly, nobody who has to type it in is going to like really long names.

Any standards, naming conventions, or guidelines should help you to choose names which are useful where you are working. I’m not going to tell you what to do, but I am going to suggest some things to avoid, or at least to watch out for.

Maximum size in characters of Table and Column names in SQL Server, Oracle and MySQL
Maximum size in characters of Table and Column names in SQL Server, Oracle and MySQL

Naming things well is an art. Naming linking tables is especially hard. It is worth spending a little (but not too much) time on naming things. The names you choose will be referenced throughout your system and will be with with you for a long time.

Over the years, the maximum size permitted for table names has increased. That has made life easier. Really short table names practically demanded structured, coded names which were a pain. On the other hand, please don’t get carried away with long names.

Naming Conventions by exception

Rather than provide detailed guidance on a naming convention, other than the “Goldilocks – Meaningful, not too short, not too long, just right!”, I’m going to give you a list of naming SQL Smells which should make your nostrils twitch:

  • Avoid using reserved words in names: I did it inadvertently once by calling a column “Desc” (for “Description”) which clashed with DESC for “descending” in SQL. It caused all sorts of trouble and confusion.
  • Avoid special characters (including spaces) in names: This is allowed if you enclose the name in square brackets ([Table$name]), but this really is looking for trouble and making work. It is far better to keep things simple. In the same spirit, don’t use square brackets where they are not necessary ([Table_Name]). It’s ugly and harder to read.
  • Be suspicious of numbers in object names: Why would a table-name need to contain a number? There are occasional cases where a column name might contain a number. By convention “Address_Line_1” etc. is fairly common. But they are rather rare. Numbers in object names should make you suspect the data model.
  • Avoid system-generated object names: SQL Server (and other database managers) will provide default names for some objects, like indexes and constraints. The names it uses are ugly and sometimes give little clue what the object is doing.

Summary

When you are designing your own database, follow the local conventions and concentrate on making the names readable and easy to use.

Where next?

Having dealt with the issue of naming, in the next article I’m going to look at a specific SQL Smell – “Packing lists into a column”.

Polymorphic Association – bad SQL Smell!

Polymorphic Association creates a bad SQL Smell
Polymorphic Association creates a bad SQL Smell

When I first read Phil Factor’s article on SQL Smells I had to look up Polymorphic Association  to make sure that I understood what he was getting at. He’s right – this is a bad SQL smell! In my option it is one that can always be avoided.

But what is a “polymorphic association” in SQL?

Polymorphic Association

The reason I didn’t recognise Phil Factor’s polymorphic association, is that I knew by the name “alternative parent”. Polymorphic Association is probably known by other names too!

Relational Databases don’t recognise the idea the idea of sub-typing. The construction Phil Factor doesn’t like is the result of a flawed attempt to provide sub-typing.

We need to deal with some real situations like:

  • “Customer” – has the sub-types:
    • Person
    • Corporate Entity (or “Business”)
  • “Something we sell” (perhaps “Saleable item”?) – has the sub-types:
    • Product
    • Service

One way to do this is to define entities with sub-types. Doing sub-typing like this is a decision in the “Conceptual Model”. I need to point out that this is contentious because it builds the sub-types into the structure of the database itself. They become “hard-coded”.

In the rest of this article I’m going to use the “Customer: Person/Business” example.

What are we trying to achieve with “polymorphic association”?

Two ways of representing sub-typing
Two ways of representing sub-typing

In the example, someone has decided that we need to be able to record that Orders. Every Order is placed by either a Person or a Business. They have decided to have separate entities for Person and Business because they have different attributes. I have seen this conceptual requirement represented in several different ways, including:

  • “Person” and “Business” surrounded by a box representing “Customer”
  • A bar (often curved) drawn across the lines representing the relationships between Person or Business and Order.
  • A Customer super-type with Person and Business sub-types.

The requirements are fairly simple:

  • An Order is always placed by a Customer
  • A Customer has to be a Person or a Business (as far as we are concerned, it can’t be both)
  • So every Order is placed by either a Person or a Business.

What’s the problem with “polymorphic association”?

The problem of polymorphic association
The problem of polymorphic association

Phil Factor identifies a problem with “polymorphic association” when the database has reached the Physical Design stage. We can spot the potential for this problem to occur much earlier, when we are gathering Requirements and creating the Conceptual Design.

The problem in the Physical Design is that the Foreign Key in Order can refer to more than one table. The Cust_Type column tells use which table to use, but the database manager will not understand that. There are also further clues in the names of the various columns – If you look, you will notice a lot of use of the “super-type” in the names!

There are several problems with this Physical Design (Phil Factor’s article gives a more technical description):

  • It is not easy to spot in the database, and that can be a problem in itself.
  • It will result in complex procedural code.
  • The Order table cannot use constraints to check that the foreign keys are valid.
  • JOINs will be complicated
  • This is construction is likely to perform poorly.

Polymorphic Associations – One solution

One solution to this problem is to recognise the “super-type” and make it explicit.

Super-type and sub-types, highlighting the relationship decorations
Super-type and sub-types, highlighting the relationship decorations

This slightly different construction with a Customer entity or table, enables us to solve the difficulties with the polymorphic association or alternative parent design. Notice how the ends of the relationships nearest the Person and Business tables are not the familiar “crow’s feet” but indicate that there must be either one or zero rows.

When we use this design all the Foreign Keys in the Order table now reference rows in the Customer table.

A solution to polymorphic association with a super-type
A solution to polymorphic association with a super-type

Spotting potential Polymorphic Associations

Deal with Polymorphic Associations during the Conceptual Design of the database. The indications that the Requirements create a situation where someone might try to use a Polymorphic Association are:

  • There are several entities which are sufficiently similar that you consider having one entity, but are sufficiently different that you think multiple entities are justified.
  • Something (like an order) needs to have an association with one (but only one) from among this group.
  • Finally, watch for the word “OR” in your thinking. A Person OR a Business places and Order.

If you encounter these conditions, first you should challenge whether sub-typing is the correct solution. If you think it is, then use the single super-type and multiple sub-types form and avoid the polymorphic association.

Summary

I agree with Phil Factor. If you encounter the polymorphic association, stop! It is very unlikely to the the best solution in SQL. Don’t try to implement a polymorphic association. Create an explicit table for the super-type and then a separate table for each of the sub-types instead.

Where next?

In the next article I’m going to look at something which is responsible for a lot of bad “SQL Smells”. This multiple bad smell is the naming of SQL objects.

Problems with “God Objects” or very wide tables

Wide Tables, Wide SQL Tables
Wide SQL Tables

One of the “SQL smells” Phil Factor identifies in his article is the presence of “God Objects” in your Database or design. I agree with him, except that I would call them “very wide tables”. If you find them, then you may have a problem with the Conceptual Model you are using, or possibly t you should be considering using a different tool. In other words, you have a problem with your requirements. You have a “Requirements Smell”.

How many columns make a “God Object” or wide table?

How many columns can you have in an SQL table?
How many columns can you have in an SQL table?

Let’s start with the obvious question: How many columns make a “God Object” or wide table? The maximum number of columns you are allowed to have in a table varies with database manager. For example:

What the actual numbers are can depend on a lot of technical things. One hundred is still a big number.

Database management software will handle wide tables up to their limits. As with most things, when you approach the limit you will start to encounter difficulties, but that is missing the point. Even 100 columns may indicate a problem.

Why are “God Objects” or wide tables a problem?

Wide tables or "God objects" are a problem which originates in the Conceptual Model

The reasons with “God Objects” or wide tables cause an SQL Smell are technical, practical and what you might term business, or even philosophical problems. I’m a Business Analyst, so I’m going to start from the “Conceptual” end, with the Requirements for the database, and then look at the problems which these tables may cause in Development and then when the system is in operation. Also remember, that if we eliminate problems at the conceptual end, then we’re not going to encounter them further on. Wide tables are most certainly a problem with starts at the “Conceptual Model” stage.

”Conceptual Model” or philosophical problems

Each row in a relational table is supposed to represent something. The “something” may be a concrete object in the real world, or it may be something abstract like a contract or a transaction. Would you be able to explain to the users of your system, or your business owners what a single row represents? If not, you are likely to encounter problems.

Thinking about the columns in this wide table, each column is contains a value. How are you going to present or update those values? 1000 fields would make for a very busy screen. Even some sort of graphical representation is likely to be complex. Do your users really need to see all this data together? While there isn’t a rule which says that the whole of an entity has to be presented on a single screen, or as a single report, it has to represent something. Finally, every column in a row provides one value for one thing at one time. Is that really so in your wide table?

Problems during development

“God objects” or wide tables encourage handling one big lump of data. That in turn is going to encourage the creation of complicated code. Maybe life would be easier for everyone if the data and the process descriptions were much more focused.

If you are in an Analyst role, then think about how you are going to explain what should (and should not) be happening with all these columns.

Remember, SQL tables have no concept of “grouping” of the columns. The columns have an order, but it is not something you should be relying on. If you can form columns into groups, then you should probably consider “normalizing” them into other tables.

Problems in operation

“God objects” or wide tables can cause problems when the system is being used. The volume of data each row contains may cause performance problems when rows are read from the table, when rows are updated and when new rows are created.

Why do we get “God objects”?

Wide tables often start from trying to convert large and complex paper forms or spreadsheets straight into table designs. It seems like a good idea at first, but it can get bogged down in unexpected complexity.

Think about your least favourite paper form, especially if it runs to several pages – maybe it’s a tax return or something similar. Obviously the physical form represents something. If you were specifying a system to work with it, then you would be tempted to have a single table where each row represented a single form, there was a column for every question and each cell contained one person’s answer to a question. It would be just like an enormous spreadsheet. Some early commercial computer systems were like that. They worked but they were inflexible.

One clue that something is going wrong (apart from the number of columns) is the number of columns which need to allow “NULL” values. How many times does “Not Applicable” appear when you are filling in the paper form?

How do we solve the problem of the wide table?

The answer is to think about what all these columns mean and then start applying Data Modelling or normalization techniques to break the data into more manageable and useable chunks. If you can from groups of columns then those groups may be candidate entities and therefore candidate tables.

If you need to use the order of similar columns then maybe you should be considering a different table design like the “Entity Attribute Value” (MVP) Pattern. But beware, because that can give rise to a bad smell too!

Excuses for “God Objects” and wide tables

Nothing in Information Technology is ever clear-cut. There are usually grey areas. One person may regard a table as too wide and another may regard it as OK. There is always room for some discussion. There are times when using a table that is a little wider than we would normally like is acceptable. Here are some of the reasons (or maybe that should be excuses) that you may here for wide tables.

  • It gets all the work done in one place, so that other programs can use the data. I don’t really buy this one. I suspect that someone is guessing what these other programs need. If the guess is wrong then someone is going to have to re-design the big, wide table. I continue to maintain that having discrete data and performing discrete actions is better.
  • Here is a specific case I found where someone wanted to retrieve data from 2000 sensors. This is a case where using something other than a relational database might be better in the first instance. Depending on the details it might also be a case where using the Entity Attribute Value (EAV) model is appropriate as well.
  • We are being given the data in the wide form from another system. This excuse I will accept, because it is really being imposed as an external requirement. But! If you need to do this, then you will need to do the work of working out what all those many columns mean, and you may have to break the wide row down into constituent parts.

Where next?

That’s addressed the “God Object” or “Wide table” smell. I’ve already mentioned the “Entity Attribute Value” (EAV) model a couple of times. I’m going to address why that may be give rise to a bad smell in the next post.