Inappropriate data-types – SQL Smells

Inappropriate Data-types cause SQL Smells!
Inappropriate SQL Data-types cause SQL Smells!

People choose inappropriate data-types. This isn’t surprising. There are lots of SQL data-types, so people make inappropriate choices. Phil Factor names “Using inappropriate data-types” as a smell in his article on SQL Smells.

I’m going to concentrate on dates and numbers in this post. I will explain why people choose inappropriate data-types. I will also describe an approach which will encourage you to choose the right ones.

This will be a superficial treatment. I’m going to look at the problem from a high level. Dates and numbers can suffer from detailed technical problems as well.

Why do people choose inappropriate data-types?

Confusion of Storage, Interchange and Presentation forms of data causes selection of inappropriate data-types
SQL: Storage, Interchange and Presentation forms of data

Phil Factor identifies the main reason for this problem as:

“Confusing how data will be presented with how it will be stored”.

I agree. Here are some reasons Analysts choose inappropriate data-types:

  • We approach problems from the outside and take the users’ point of view. We should consider presentation. Inside the database, data should be stored in an appropriate form.
  • The same argument applies for interfaces and interchange formats. Interface requirements should not determine the way data is stored internally. Interfaces are still important. Where possible, standard interchange formats should be used.
  • Spreadsheets have made us lazy. You don’t have to think about the “data-type” when you key something into a cell. Format and validation are often added afterwards.
  • There are “folk memories” about problems with data in old file-based systems. These systems did not have the rich range of data-types of modern databases and languages.

Consequences of using inappropriate data-types

Inappropriate data types can have serious consequences for a system we are building. Some of these problems are not obvious. Many of these problems apply to all systems. Some of these problems become more important with larger databases.

  • Having the wrong format makes validation harder. It prevents the database engine from checking the content and increases the risk of “garbage data” getting into the system.
  • The possible presence of garbage data makes error handling throughout the system harder.
  • Storing data in “display format” imbeds that format deep inside the system.
  • “Dates” are associated with useful functions which simplify program design.
  • Inappropriate data-types can change how data is sorted. This influences how indexes work and cause performance issues.
  • The “correct” data-types are usually very space efficient. Using alternatives can waste space in the database for no benefit.

Let’s look at some specific examples:

Inappropriate data-types for Numbers, especially currency

SQL: Storing numbers as character strings - inappropriate data-types
SQL: Storing numbers as character strings

It is possible to present numbers as strings, even including the decimal and thousands separators and any related currency symbols.

Interchange files can contain numbers as text, because it is convenient.

Numbers stored as strings are harder to validate.

Numbers stored as strings are sorted differently to numbers stored as numbers. If you doubt me, then try the experiment of illustrated in the Figure: “Number versus Character sorting” in your favourite spreadsheet.

SQL: Number versus Character sorting. Consequences of inappropriate data-types
SQL: Number versus Character sorting

Inappropriate data-types for Dates

"<yoastmark

Some early databases did not handle dates very well. This encouraged designers to do-it-themselves with varying degrees of success.

It is possible to represent a date as an integer. Such a “date” will sort as you expect, but needs its own validation and will not help you with date arithmetic.

It is also possible (and unfortunately common) to store dates in character fields. In most cases this is simply “an accident waiting to happen”!

All these do-it-yourself options are vulnerable to the problem that Americans tend to specify dates “mm-dd-yy” and Europeans (including the British) tend to specify dates “dd-mm-yy”. There is nothing we as analysts can do about this except to make sure that the test data for any system always includes a date with “13th of the Month”!

Benefits of using the appropriate data-types

The benefits of using the appropriate data-types far outweigh any perceived costs. Most of the “cost” is simply being aware that there are options and then not choosing the inappropriate data-types!

Using the appropriate data-types will:

  • Help protect you from “garbage data” (The database will reject an incorrect leap year 29th of February!)
  • Sort as you (and the business) expect without the need to work out the details.
  • Allow you to specify the presentation separately from the storage. Many languages and presentation frameworks have these facilities built in.
  • Take up less storage space.
  • Make your system perform better!

How to prevent appropriate data-types

Choosing the Appropriate Data-type - Problem Prevention. Preventing inappropriate data-types
Choosing the Appropriate Data-type – Problem Prevention

People choose inappropriate data-types in the transition from the “Conceptual Model” to the “Logical Model” (or possibly the “Physical Model”). We have selected the entities and attributes the system needs, but we have chosen inappropriate data-types.

The solution is to separate different aspects of the data and the decisions we need to make in our minds.

Here is the approach I recommend:

In the Conceptual Model:

  • Decide what data you need (the “attribute”) and decide what kind of data it is. Do this from a “non-computer” point of view.
  • If it is a number, say what it counts or measures and what the units are.
  • Treat “money” as a unit.
  • For dates and times, label them loosely as “date”, “time” or “date-time”.
  • Record any “limits”. The database designer may use them in detailed design.
  • Say how the users of the system will expect to see it presented. You will use this information in the user-interface design.

In the Logical Model:

  • Decide what kind of “bucket” the database should put it into. A database professional may help you with this.
  • If it is a number, and a precise value, say it is an Integer (of some kind). For decimals, say how many decimal places you need.
  • Look hard at dates and times. Do you mean “date” or “time of day”? Do you need an “elapsed time” or a “point in time”?

In the Physical Model:

  • Decide exactly what SQL Data-type you are going to use. Many of the basic data-types have alternatives. There are several types of “Integer” and quite a lot of “Date and Time” types.
  • This is a good time to talk to a database professional.

Summary

There are two main reasons for choosing inappropriate data-types in SQL:

  • Concentrating too much on how data will be presented, rather than how it will be stored
  • Making decisions about the Physical Model prematurely

Using inappropriate data-types can have wide-ranging harmful effects on your database and system.

Avoid the problems by following a simple process:

  1. Concentrate on what data the system needs in the Conceptual Model.
  2. Outline how that data should be stored in the Logical Model.
  3. Confirm the exact SQL data-type in the Physical Model.

This does not have to be difficult or time consuming. It fits perfectly well with “Agile” development.

Where next?

“Inappropriate data-types” was a problem with converting a Conceptual Model into a Logical Model. In the next article I’m going to look at the SQL Smells and Requirements Smells around – “Using Hierarchies”.

Process Models: Who is doing what?

Process Models: Who is doing what?

Process Models
Process Model Diagram

Every Business Analyst and Business Consultant should know something about Process Models and Process Modelling. Creating a simple model of what is happening outside the IT System can be a very useful place to start. You may even benefit when there is no IT System at all!

Ways of using Process Models

A Process Model is simply a representation of what the process is doing in the real world. This representation is usually graphical. There are different notations, and a large number of tools to help you draw the pictures.

Trivial Process Model - Documented using a Process Modelling tool
Trivial Process Model – Documented using a Process Modelling tool

Process Models can be used in a number of ways, many of which overlap.

  • A model can be used as a framework to assess the process against some criteria.
  • Models can be used to explore the effect of some change to the process.
  • Models can also be used to show how the physical world and IT interact.
    Using Process Models appropriately can help ensure that any changes are beneficial to the business.

Process Models: “As-Is” and “To-Be”

Process Models can be used to explore changes to a process. The “As-Is” model shows how the process works now, and the “To-Be” model shows how the process will work after the proposed changes. Comparing the two models allows us to demonstrate how the changes will be beneficial to the business.

The changes need not be changes to IT systems. The benefits which can be demonstrated may be the elimination of roles, or reduction in time or the number of steps.

Process Models: How does IT mesh with the business?

Gears - Process Models show how things interact
Gears – Process Models show how things interact

The Swim-lane process models demonstrate how different roles collaborate or people use several different tools or IT systems to perform their work. Imagine the different roles or tools as “gears” and you will understand what I mean. Using swim-lanes helps you to visualise and communicate how the different lanes interact.

Traps you can avoid using Process Models

Where a business process involves activities in the physical world (and not just doing things at a screen) then a process model can help put the IT systems into context. Doing this may prevent you spending time on details which are not important.

Ways Process Models can trap you!

Many process modelling tools allow you to break-down individual steps into smaller pieces. Resist the temptation to break things down too early, or everywhere. If you keep your models at a high level you will reduce the amount of work you have to do, and you will not reduce the value of the models.

Conclusion

Process Models can be used to put the IT system into a wider business context. They can even be used to analyse and rationalise processes where the IT systems do not play a significant role. Process Models are commonly used to demonstrate the claimed benefits of a new way of doing things, the so-called “As-Is” and “To-Be” models.

The keys to success with Process Models are to present the simplest model which is appropriate for your needs and to control the amount of detail.

Working Procedures: Benefits of doing things consistently

Working Procedures: Benefits of doing things consistently

Working Procedure Diagram
Working Procedure Diagram

I’ve been busy. One of the things which got me through the busy period was that I have “Working Procedures” for some of the boring but important tasks I have to do. Having these things documented made it much easier for me to delegate them or simply tell myself to get on and do them.

“Busy-ness” can come about for all sorts of reasons, including distractions and interruptions from the outside world and higher priority project work. In my case it was a mixture of all of these and a few more too!

I’ve always been keen on writing things down, but I sometime last year I discovered the work of Sam Carpenter and his “Work the System”.  At first I thought “Work the System” meant taking advantage of the way things worked and “cheating” in some way. That was wrong. Sam’s point is that it is easy to spend one’s time in “crisis management” or as Sam puts it – “fire killing”. . Instead, Sam proposes that we invest time in writing procedures or “maintaining the machines”. That way, over time, we will perform tasks more consistently and our performance will improve. Reading Sam’s book (and I have no connection with Sam Carpenter. I don’t expect he even knows I exist) convinced me that this was something worth trying. I tried it out and I think that it is helping me and my business be more consistent and make progressive improvements.

Working Procedures: Benefits

The benefits of using a working procedure are:

  • Nobody has to remember or research how to do the task.
  • The task is performed consistently. The way it is done this time is the same as the way it was done last time.
  • If a problem occurs, then the procedure documents what was being done (and the expected outcome)
  • The Working Procedure creates a framework for documenting the solution to a problem and therefore incrementally improving the procedure and the way we work.
  • The Working Procedure forms the basis for delegating performing the task.

Working Procedures: Candidates

I’m not that keen on maintenance and administration. I don’t think I’m alone in that either! I know maintenance and administration tasks are necessary but I used to make excuses, and put them off till tomorrow. The trouble with doing that is that a backlog builds up, especially when I’m busy. That backlog gives rise to anxiety which becomes a distraction, that’s not a good situation.

One of my Working Procedures is getting my website backed up. It isn’t difficult but it needs to be done. Now that I have it written down, I just do the procedure (or even better, give it to someone else!) and it gets done – consistently.

Working Procedures: Getting Started

Snapshot of "Website Backup" Working Procedure
Snapshot of “Website Backup” Working Procedure

I think I can hear someone saying “but I don’t have time to write the procedure”. Well, I think you do. All I did the first time was write down what I was going to do and then check off the items as I did them (and make some minor adjustments). When I was finished I saved the check-list. The next time I did a back-up I got out the procedure. The second time around it was quicker and easier, because I didn’t have to think too hard about what I had to do. The third and fourth times the task got progressively easier and what’s more, when I needed to make minor adjustments, I already had the framework for recording them.

Conclusion

There is almost no downside to this. Having Working Procedures, especially for the tasks which we want to avoid helps get them done. It worked for me. Trying out the approach doesn’t have to cost anything at all. If it works for you, you can do more of it. If it doesn’t work for you then you haven’t lost very much.

Mind maps and SQL

In a recent discussion on LinkedIn, I mentioned that I use Mind-mapping. I generally prefer pen and paper or pen and white-board, because I don’t like to be constrained by what the tool wants to do. I do use Freemind sometimes, and I said in the discussion that I what I sometimes do is:

  1. Create the mindmap freehand
  2. Transfer that into Freemind – which consolidates the thinking and gives me something with is tidy and easier to maintain, and then
  3. “Print” the map to pdf – which is easy to distribute and can form the basis of discussion at a distance.

I thought I would illustrate this with an example:

A recent project of mine has been creating an introductory course titled “SQL and Relational Databases for analysts”.

The objective of the course was to give a basic understanding of SQL to Business and Technical Analysts.
It was intended to use MS SQL Server, but not be a course on SQL Server. The reason for this was to make the skills learned as portable as reasonably possible.

As it was intended to be an introduction, certain things I would like to have included (like UNION, HAVING and the database catalogue) didn’t make the cut on grounds of keeping the size of the course down.

Anyway, the content of the course was documented in a mind-map which was then discussed with people in different places over a short period. I’ve attached the final version of the mind-map.
Everything in the mind-map (with the exception of the “title block” in the middle) was produced in Freemind.

The mind-map proved to be useful for agreeing what the content and structure of the course was going to be and then as a reminder of scope during the development of the course.

Here’s mind-map (it was intended to be printed, if that ever happened, on A3 paper).