DateTime, Date and Time – SQL Smells

Date and Time - Point or Period - An SQL Smell?
Date and Time – Point or Period – An SQL Smell?

Phil Factor identifies several SQL Smells associated with the use of DateTime, Date and Time data-types. Using the wrong types will waste space, harm performance and create “odd” behaviour.

If you are clear about what you are recording, you will avoid these issues. I prefer to say what I want, and let an expert choose the best date-types. In other words, I prefer to separate the analyst and designer roles. That way I avoid suggesting the wrong types.

DateTime, Date or Time – Which do you need?

DateTime columns which contain only Date or Time contribute two of SQL Smells. This wastes space, both in storage and in memory (which will degrade performance).

There is something worse here. Using DateTime in this way suggests we are not clear about what we want. This lack of clarity encourages the designer to “hedge their bet” by using DateTime.

How precise do you need to be?

Business Systems need to record dates and times, but they don’t need great precision. For many business transactions, the nearest second or even minute is adequate. In some cases recording extra precision can be misleading.

Storing Durations

For a “Period” you need to know 3 things: Start, End and Duration. You only need to store two out of the 3. The third value can always be calculated if you have the other two.

Finding if an Event is inside a Period is easier using Start and End
Finding if an Event is inside a Period is easier using Start and End

Storing the “start” and “end” is more flexible. It is straightforward to work out if an event took place within a given period.

Finding if Periods Overlap is easy using Start and End times
Finding if Periods Overlap is easy using Start and End times

Telling if periods overlap is easy too.

If you decide to store a duration you must specify the units you intend and the precision you need. (As Phil puts it “milliseconds? Quarters? Weeks?”). Do not be tempted to store a duration in a “Time” data-type! “Time” is intended for “Time-of-day” not duration.

Dates and Times: Choosing the right data-type – Some simple questions

Choosing appropriate data-types for dates and times is not difficult, if you go about it the right way.

Divide the job into two steps: “The Requirement” and the “The Most suitable technical data-type”. Do the two steps separately, taking into account any local standards and conventions.

The Requirement

  1. Is this an “Event” (a “Point in Time”) a “Period” or a “Duration” (both have beginning and an end)?
  2. Event: What is the best way to represent this data? Should it be a “Date” or a “Time”? Does it really need “Date-time”?
  3. Duration: What are the units and scale of this duration?
  4. How precise do you need the value to be? You may surprise yourself.
  5. How long do you expect this system (and the data in it) to last?

The Most Suitable Technical Data-type

Technical Properties of Date and Time data-types in Microsoft SQL Server
Technical Properties of Date and Time data-types in Microsoft SQL Server

Use the table (based on Microsoft’s Documentation) to choose the best data-type for your needs. This table is for SQL Server. Other database managers will have similar tables.

Microsoft recommends using DATETIME2 in preference to DATETIME for new work (it takes up less space and is more accurate). Providing the maximum date is acceptable, I would consider SMALLDATETIME for business transactions (but you do risk creating a “Year 2000 problem” if the data turns out to have a long life).

If your system will span several time-zones, then you should definitely consider the benefits of using the DATETIMEOFFSET data-type.

Summary

The DateTime, Date and Time data-types can all cause SQL Smells when they are used inappropriately. Problems can be avoided by following some simple guidelines.

Where next?

Phil Factor doesn’t like “SELECT *”. Find out why in the next article.

Practice makes…?

One of the issues with working from a home office, as I do a great deal of the time, is “education” or “training”. I need to keep up to date. I need to learn about new things. The problem is that very few opportunities come and knock on my door. Of course, the internet is a wonderful thing, but it is like a good public library. If you like reading, you can get lost or even lose yourself in there. That’s where personal recommendation comes in.
Quite recently I found a education site called Udemy. Maybe you knew about it already, I didn’t. I decided to take a couple of courses to find out if I liked the experience: I did, and I do.
One of the courses I took was called “Performance of Speaking” by a man called Tom j Dolan (he writes it like that, so I will as well). I confess, that one of the reasons I took the course was that I was curious about how effective a course in such a subject could be as distance learning. All I can say is “it worked for me”.
I consider myself a reasonable public speaker. I am comfortable addressing a room containing tens or maybe even a hundred people. I haven’t tried addressing a stadium full yet but maybe that will come. Never-the-less I felt there was room for improvement.
Tom’s credentials are excellent and his approach is quite simple: public speaking is a practical skill. It is something which can be learned. He makes an important point: many of us are too critical of ourselves. We demand “perfection” (whatever that is). That is really an unreasonable demand we are making. Instead we should aim for improvement “Kaizen” as the Japanese would have it. Continuous improvement is a better goal than perfection. We can usually improve. The best musicians practice constantly.
There are many skills like this. We can learn the facts, we can answer questions and give the “right” answers, but to become really good at them, we have to practice. We (or at the very least, I) are creatures of habit. When we first learn a new behaviour it takes a great deal of effort. As we practice we get better at the execution of the behaviour, but not only that, we also find that we have more capacity to think about how and why we are doing it. Experience is a valuable thing.
As part of something I am doing at the moment, I need to record and then edit my own voice. Tom’s course has helped me get used to the awkwardness I felt. It hasn’t changed the content at all but it has improved the delivery and also how I feel about the delivery.

Have you considered using the cloud?

…I know I have. If you read the blurb being written by all and sundry (and now including me) you would be forgiven for imagining that the entire world either lives with its head in the cloud, or is considering doing so in the near future.
As I allow myself a limited budget of both time and money for “education” I’m careful what I spend it on.  Last Thursday (24th October 2013) I went to the “Cloud Success Roadshow” in Limerick. It was well worth my investment in time.
The roadshow is run by a company called Let’s Operate (http://www.letsoperate.com/) and although they show you their products, they definitely didn’t go in for the hard sell.
The roadshow reminded me of the factors I need to consider about my “IT strategy” as a whole. In a lot of cases the issue is not finding out what the characteristics of the “Cloud Option” will be, but finding out matching characteristics of the alternative. For example:
  • “The Cloud will cost x” (per user/month), but how much does my server actually cost me?
  • For that matter, how much is all that data worth (to me)?
Thinking properly about the Cloud will almost certainly make you think very hard about the speed and quality of your broadband connection and your dependence on it. If, like me, you live out in the country and work from home some of the time then this is important.  Never mind the quality, a year ago, I lost my broadband when some “eejit” demolished a telegraph pole just down the road! I was sent scurrying down the road to borrow a connection from an acquaintance in order to send a vital eMail reply. I’ve done something about that.
There are still a couple of stops planned on the roadshow. If you live in Dublin or Belfast I suggest you consider spending an afternoon there if you have the time.

Will losing constraints set you free?

I’ve been busy with a project, I’ve finally got round to writing this a week later than I intended…
In a recent conversation, someone pointed out that people sometimes remove “constraints” from a database in order to improve performance. This made me ask myself:
Is this a good thing, or a bad thing?
I have to admit that this is a technical change that I have considered in the past. Never-the-less, I have mixed feelings about it.
After some thought, my opinion is:
  • For many situations a constraint is redundant. The fundamental structure of many applications means they are unlikely to create orphan rows.
  • The cost of the constraint is in the extra processing it causes during update operations. This cost is incurred every time a value in the constrained column is updated.
  • The benefit of a constraint is that it absolutely protects the constrained column from rogue values. This may be particularly relevant if the system has components (such as load utilities or interfaces with other systems) which by-pass the normal business transactions.
  • Other benefits of constraints are that they unequivocally state the “intention” of a relationship between tables and they allow diagramming tools which navigate the relationships to “do their thing”. Constraints provide good documentation, which is securely integrated with the database itself.
In short:
  • The costs of constraints are small, but constant and in the immediate term.
  • The benefits of constraints are avoiding a potentially large cost, but all in the future.
It’s the old “insurance” argument. Make the decision honestly based on a proper assessment of the real risk and your attitude to taking risks. Be lucky!

More Detailed Argument

For those who don’t just want to take my word for it. Here is a more detailed argument.
Let’s take the “business data model” of a pretty normal “selling” application.
When we perform the activities “Take Order” (maybe that should be “Take ORDER”), or “Update Order”
  • we create or update the ORDER and ORDER_LINE entities, and
  • in addition we refer to PRODUCT (to get availability and Price) and presumably to the CUSTOMER entity which isn’t shown on the diagram.
When I translate this into a Logical data model, I impose an additional rule “Every ORDER must contain at least 1 ORDER_LINE”. The original business model doesn’t impose this restriction.
Remember some people do allow ORDERs with no ORDER_LINES. They usually do it as part of a “reservation” or “priority process” which we are not going to try and have here.
When the transaction which creates the ORDER and ORDER_LINE makes it’s updates, then it will have read CUSTOMER and ORDER, so it is unlikely to produce orphan records, with or without constraints.
On the other hand, by having the constraints we can document the relationships in the database (so that a diagramming tool can produce the ERD diagram (really I suppose that should be “Table Relationship Diagram”)).
I am left wondering whether it would be possible or desirable to enforce my  “Every ORDER must contain at least 1 ORDER_LINE” rule. I’ll think about that further. (Note to self: Can this be represented as a constraint which does not impose unnecessary and unintended restrictions on creating an ORDER?)
If we don’t have constraints and we have something other than our transaction which is allowed to create ORDERs and/or ORDER_LINEs (As I said, typically this would be an interface with another system or some kind of bulk load), we have no way of knowing how reliably it does it’s checking, and we might be allowing things we really do not want into our system. Constraints would reject faulty records and the errors they created (or “threw”) could be trapped by the interface.

 

What if you don’t have a Data Model?

My previous post got me thinking. One thing leads to another as they say. The whole of my process really requires that you have a Data Model (aka Entity Relationship Diagram/Model and several other names). But what do you do if you don’t have one, and can’t easily create one?

What’s the problem?

Suppose you have a database which has been defined without foreign key constraints. The data modelling tools use these constraints to identify the relationships they need to draw. The result is that any modelling tool is likely to produce a data model which looks like the figure below. This is not very useful!
Faced with this, some people will despair or run away! This is not necessary. Unless the database has been constructed in a deliberately obscure and perverse way (much rarer than some developers would have you believe) then it is usually possible to make sense of what is there. Remember, you have the database itself, and that is very well documented! Steve McConnellwould point out that the code is the one thing that you always have (and in the case of a database, it always matches what is actually there!).
To do what I propose you will need to use the “system tables” which document the design of the database. You will need to know (or learn) how to write some queries, or find an assistant who understands SQL and Relational Databases. I’ve used MS SQL Server in my examples, but the actual names vary between different database managers. For example: I seem to remember that in IBM DB/2 that’s Sysibm.systables. You will have to use the names appropriate for you.

The method

“Method” makes this sound more scientific than it is, but it still works!
  1. Preparation: Collect any documentation
  2. “Brainstorm:” Try to guess the tables/entities you will find in the database.
  3. List the actual Tables
  4. Group the Tables: based on name
  5. For each “chunk”, do the following:
    1. Identify “keys” for tables: Look for Unique indexes.
    2. Identify candidate relationships: Based on attribute names, and non-unique indexes.
    3. Draw your relationships.
    4. “Push out” any tables that don’t fit.
    5. Move on to the next group.
  6. When you’ve done all the groups, look for relationships from a table in one group to a table in another.
  7. Now try and bring in tables that were “pushed out”, or are in the “Miscellaneous” bucket.
  8. Repeat until you have accounted for all the tables.

 

At this point you are probably ready to apply the techniques I described in my previous post (if you haven’t been using them already). You might also consider entering what you have produced into your favourite database modelling tool.

The method stages in more detail.

Preparation:

Collect whatever documentation you have for the system as a whole: Use Cases, Menu structures, anything! The important thing is not detail, but to get an overview of what the system is supposed to do.

“Brainstorm:”

Based on the material above, try to guess the tables/entities you will find in the database. Concentrate on the “Things” and “Transactions” categories described in my previous post.
Don’t spend ages doing this. Just long enough so you have an expectation of what you are looking for.
Remember that people may use different names for the same thing e.g. ORDER may be PURCHASE_ORDER, SALES_ORDER or SALE.

List the tables:

select name, object_id, type, type_desc, create_date, modify_date
from sys.tables
(try sys.Views as well)

Group the tables

Group the tables based on name: ORDER, ORDER_ITEM, SALES_ORDER, PURCHASE_ORDER and ORDER_ITEM would all go together.
Break the whole database into a number of “chunks”. Aim for each chunk to have say 10 members, but do what seems natural, rather than forcing a particular number. Expect to have a number of tables left over at the end. Put them in a “Miscellaneous” bucket.

Identify the candidate keys, and foreign keys from the indexes

select  tab.name, idx.index_id, idx.name , idx.is_unique
from
sys.indexes as idx
join sys.tables as tab on tab.object_id = idx.object_id
where
tab.name like ‘%site%’;
 
select tab.name, col.column_id, col.name 
from
sys.columns as col
Join sys.tables as tab on col.object_id = tab.object_id
where
tab.name like ‘%PhoneBook%’
order by 1, 2 ;
 

From attribute names and indexes, identify relationships

 

  • Sometimes the index names are a give-way (FK_…)
  • Sometimes you have to look for similar column names (ORDER_LINE.ORDER_ID à ORDER.ID)
  • Multi-part indexes sometimes indicate hierarchies.

 

 Push Out

Look for “Inter.group” Relationships