Problems with “God Objects” or very wide tables

Wide Tables, Wide SQL Tables
Wide SQL Tables

One of the “SQL smells” Phil Factor identifies in his article is the presence of “God Objects” in your Database or design. I agree with him, except that I would call them “very wide tables”. If you find them, then you may have a problem with the Conceptual Model you are using, or possibly t you should be considering using a different tool. In other words, you have a problem with your requirements. You have a “Requirements Smell”.

How many columns make a “God Object” or wide table?

How many columns can you have in an SQL table?
How many columns can you have in an SQL table?

Let’s start with the obvious question: How many columns make a “God Object” or wide table? The maximum number of columns you are allowed to have in a table varies with database manager. For example:

What the actual numbers are can depend on a lot of technical things. One hundred is still a big number.

Database management software will handle wide tables up to their limits. As with most things, when you approach the limit you will start to encounter difficulties, but that is missing the point. Even 100 columns may indicate a problem.

Why are “God Objects” or wide tables a problem?

Wide tables or "God objects" are a problem which originates in the Conceptual Model

The reasons with “God Objects” or wide tables cause an SQL Smell are technical, practical and what you might term business, or even philosophical problems. I’m a Business Analyst, so I’m going to start from the “Conceptual” end, with the Requirements for the database, and then look at the problems which these tables may cause in Development and then when the system is in operation. Also remember, that if we eliminate problems at the conceptual end, then we’re not going to encounter them further on. Wide tables are most certainly a problem with starts at the “Conceptual Model” stage.

”Conceptual Model” or philosophical problems

Each row in a relational table is supposed to represent something. The “something” may be a concrete object in the real world, or it may be something abstract like a contract or a transaction. Would you be able to explain to the users of your system, or your business owners what a single row represents? If not, you are likely to encounter problems.

Thinking about the columns in this wide table, each column is contains a value. How are you going to present or update those values? 1000 fields would make for a very busy screen. Even some sort of graphical representation is likely to be complex. Do your users really need to see all this data together? While there isn’t a rule which says that the whole of an entity has to be presented on a single screen, or as a single report, it has to represent something. Finally, every column in a row provides one value for one thing at one time. Is that really so in your wide table?

Problems during development

“God objects” or wide tables encourage handling one big lump of data. That in turn is going to encourage the creation of complicated code. Maybe life would be easier for everyone if the data and the process descriptions were much more focused.

If you are in an Analyst role, then think about how you are going to explain what should (and should not) be happening with all these columns.

Remember, SQL tables have no concept of “grouping” of the columns. The columns have an order, but it is not something you should be relying on. If you can form columns into groups, then you should probably consider “normalizing” them into other tables.

Problems in operation

“God objects” or wide tables can cause problems when the system is being used. The volume of data each row contains may cause performance problems when rows are read from the table, when rows are updated and when new rows are created.

Why do we get “God objects”?

Wide tables often start from trying to convert large and complex paper forms or spreadsheets straight into table designs. It seems like a good idea at first, but it can get bogged down in unexpected complexity.

Think about your least favourite paper form, especially if it runs to several pages – maybe it’s a tax return or something similar. Obviously the physical form represents something. If you were specifying a system to work with it, then you would be tempted to have a single table where each row represented a single form, there was a column for every question and each cell contained one person’s answer to a question. It would be just like an enormous spreadsheet. Some early commercial computer systems were like that. They worked but they were inflexible.

One clue that something is going wrong (apart from the number of columns) is the number of columns which need to allow “NULL” values. How many times does “Not Applicable” appear when you are filling in the paper form?

How do we solve the problem of the wide table?

The answer is to think about what all these columns mean and then start applying Data Modelling or normalization techniques to break the data into more manageable and useable chunks. If you can from groups of columns then those groups may be candidate entities and therefore candidate tables.

If you need to use the order of similar columns then maybe you should be considering a different table design like the “Entity Attribute Value” (MVP) Pattern. But beware, because that can give rise to a bad smell too!

Excuses for “God Objects” and wide tables

Nothing in Information Technology is ever clear-cut. There are usually grey areas. One person may regard a table as too wide and another may regard it as OK. There is always room for some discussion. There are times when using a table that is a little wider than we would normally like is acceptable. Here are some of the reasons (or maybe that should be excuses) that you may here for wide tables.

  • It gets all the work done in one place, so that other programs can use the data. I don’t really buy this one. I suspect that someone is guessing what these other programs need. If the guess is wrong then someone is going to have to re-design the big, wide table. I continue to maintain that having discrete data and performing discrete actions is better.
  • Here is a specific case I found where someone wanted to retrieve data from 2000 sensors. This is a case where using something other than a relational database might be better in the first instance. Depending on the details it might also be a case where using the Entity Attribute Value (EAV) model is appropriate as well.
  • We are being given the data in the wide form from another system. This excuse I will accept, because it is really being imposed as an external requirement. But! If you need to do this, then you will need to do the work of working out what all those many columns mean, and you may have to break the wide row down into constituent parts.

Where next?

That’s addressed the “God Object” or “Wide table” smell. I’ve already mentioned the “Entity Attribute Value” (EAV) model a couple of times. I’m going to address why that may be give rise to a bad smell in the next post.

What if you don’t have a Data Model?

My previous post got me thinking. One thing leads to another as they say. The whole of my process really requires that you have a Data Model (aka Entity Relationship Diagram/Model and several other names). But what do you do if you don’t have one, and can’t easily create one?

What’s the problem?

Suppose you have a database which has been defined without foreign key constraints. The data modelling tools use these constraints to identify the relationships they need to draw. The result is that any modelling tool is likely to produce a data model which looks like the figure below. This is not very useful!
Faced with this, some people will despair or run away! This is not necessary. Unless the database has been constructed in a deliberately obscure and perverse way (much rarer than some developers would have you believe) then it is usually possible to make sense of what is there. Remember, you have the database itself, and that is very well documented! Steve McConnellwould point out that the code is the one thing that you always have (and in the case of a database, it always matches what is actually there!).
To do what I propose you will need to use the “system tables” which document the design of the database. You will need to know (or learn) how to write some queries, or find an assistant who understands SQL and Relational Databases. I’ve used MS SQL Server in my examples, but the actual names vary between different database managers. For example: I seem to remember that in IBM DB/2 that’s Sysibm.systables. You will have to use the names appropriate for you.

The method

“Method” makes this sound more scientific than it is, but it still works!
  1. Preparation: Collect any documentation
  2. “Brainstorm:” Try to guess the tables/entities you will find in the database.
  3. List the actual Tables
  4. Group the Tables: based on name
  5. For each “chunk”, do the following:
    1. Identify “keys” for tables: Look for Unique indexes.
    2. Identify candidate relationships: Based on attribute names, and non-unique indexes.
    3. Draw your relationships.
    4. “Push out” any tables that don’t fit.
    5. Move on to the next group.
  6. When you’ve done all the groups, look for relationships from a table in one group to a table in another.
  7. Now try and bring in tables that were “pushed out”, or are in the “Miscellaneous” bucket.
  8. Repeat until you have accounted for all the tables.

 

At this point you are probably ready to apply the techniques I described in my previous post (if you haven’t been using them already). You might also consider entering what you have produced into your favourite database modelling tool.

The method stages in more detail.

Preparation:

Collect whatever documentation you have for the system as a whole: Use Cases, Menu structures, anything! The important thing is not detail, but to get an overview of what the system is supposed to do.

“Brainstorm:”

Based on the material above, try to guess the tables/entities you will find in the database. Concentrate on the “Things” and “Transactions” categories described in my previous post.
Don’t spend ages doing this. Just long enough so you have an expectation of what you are looking for.
Remember that people may use different names for the same thing e.g. ORDER may be PURCHASE_ORDER, SALES_ORDER or SALE.

List the tables:

select name, object_id, type, type_desc, create_date, modify_date
from sys.tables
(try sys.Views as well)

Group the tables

Group the tables based on name: ORDER, ORDER_ITEM, SALES_ORDER, PURCHASE_ORDER and ORDER_ITEM would all go together.
Break the whole database into a number of “chunks”. Aim for each chunk to have say 10 members, but do what seems natural, rather than forcing a particular number. Expect to have a number of tables left over at the end. Put them in a “Miscellaneous” bucket.

Identify the candidate keys, and foreign keys from the indexes

select  tab.name, idx.index_id, idx.name , idx.is_unique
from
sys.indexes as idx
join sys.tables as tab on tab.object_id = idx.object_id
where
tab.name like ‘%site%’;
 
select tab.name, col.column_id, col.name 
from
sys.columns as col
Join sys.tables as tab on col.object_id = tab.object_id
where
tab.name like ‘%PhoneBook%’
order by 1, 2 ;
 

From attribute names and indexes, identify relationships

 

  • Sometimes the index names are a give-way (FK_…)
  • Sometimes you have to look for similar column names (ORDER_LINE.ORDER_ID à ORDER.ID)
  • Multi-part indexes sometimes indicate hierarchies.

 

 Push Out

Look for “Inter.group” Relationships

 

Security – Using Server Side security with MS SQL Server

There are times when I want to do something, and it gives rise to questions. Often I have to put the questions to one side in order to get on with the thing which is my immediate priority. When that happens, I put the questions to one side with the intention of coming back to them in the future. This is one of the occasions when I have had the opportunity to go back and answer some of the questions.
The questions in this case were:
  • How does Server-side security work on MS SQL Server? And
  • How can I control the access different users have to my database?

I’m not (have never been, and do not really intend to become) a DBA (Database Administrator). I understand a bit about “privileges” but in the past I’ve always had someone else “doing it for me”, and in any case , I’ve worked on other databases such as DB/2 or Oracle.

After a little bit of research and a bit of experimentation (fiddling around), I found the answer to my questions. Although it was hardly earth-shattering, the understanding is satisfying.
The key is to understand that in MS SQL Server the “Server Instance” and the “Database” are separate entities. It is possible to have several databases inside the same Server instance. In fact I do it all the time when I’m experimenting. This means that there are two separate things to define:
  • A LOGIN, which gives access to the Server Instance, and with Server-side authentication, provides the security, and
  • A USER, which belongs to the Database and is granted the Database privileges (including CONNECT).
Figure 1 Server-side security for MS SQL Server
Summary of the stages: 
  1. Ensure that SQL Server is set up to allow Server Authentication
  2. Create the LOGIN (in the SQL Server Instance)
  3. Create a USER corresponding with the LOGIN (in the Database)
  4. Grant the USER CONNECT privilege (happens by default, but can be revoked)
  5. Grant the USER the appropriate privileges on the Database
If you’re interested in taking this a bit further, I’ve summed all this up in a video on YouTube.