News

Floating Point Numbers – SQL Smell

Do you use Floating Point numbers? Do you even know what they are?

Phil Factor regards Floating Point Numbers as an SQL Smell. That is to say “something which may cause problems”.

I’m going to explain why this is from an analyst’s point of view.

What are Floating Point numbers?

When I first started programming with FORTRAN we only had two sorts of numbers: integers and floating-point. You used integers for counting things and floating-point for calculations. It was that simple!

Modern languages and SQL databases allow other options, including precise decimal forms.

Floating point numbers use the binary equivalent of “Scientific” or “Engineering” notation. They stored differently to integers or decimals.
Their value is stored in two parts which are known as the “mantissa” and the “exponent”. This arrangement allows them to hold an enormous range of values very efficiently but at the expense of precision.

Problems with Floating Point numbers

Floating point numbers are intended for scientific or engineering calculations.

They can cause problems in normal business calculations.

• Arithmetic can cause rounding.
• Testing for equality can produce unexpected results.
• Phil Factor does not like them being used in keys or indexes.

Size and range of Numbers

Floating-point numbers allow a very large and very small values. I cannot imagine examples of the extremes.

On the other hand, precise numeric data-types do not have the problems associated with floating point.

When should you use Floating Point numbers?

Very few businesses or “domains” need to use floating-point numbers. The table shows some example where they would be essential. In most cases these are estimates of extremely large, or very small, numbers.

Look at it like this: if you are working in engineering, astronomy or nuclear physics you may need floating-point numbers. You probably know if you really need them.

Summary

I recommend avoiding floating-point numbers. Phil Factor describes them as an SQL Smell which should be investigated. They are essential for some applications. When misapplied they will cause problems. If you think you need to use them, think hard.

Where next?

This is the concluding post in this sequence of articles. I’ve chosen to concentrate on potential problems which are likely to trouble analysts like me. Phil Factor’s original article identifies many more SQL Smells. These include details of design and programming. If you spend time working with SQL and specifying or designing databases, then I recommend the article both as background reading and as a handy reference.

As a Postscript: I don’t think Floating Point Numbers are bad, just people sometimes use them in the wrong way. Hugo Kornelis thought it was worth his time writing a post in defence of them – Float does NOT suck!

LIKE with an initial wildcard character – SQL Smell

Phil Factor says “using LIKE in a WHERE clause with an initial wildcard character” is an SQL Smell, but why? Why does the position of the wildcard make a difference?

This really is a problem, and it gets more serious with larger tables.

I’m going to explain what the problem is from an analyst’s point of view.

What LIKE does for us

LIKE identifies rows which match the criteria. It can be used with character columns and combined with other conditions.

The first time I encountered this statement I was amazed that it worked at all! This simply demonstrates how SQL protects us from what is really happening underneath the surface.

Appearances can be deceptive. LIKE only appears to match the search argument against the contents of the table column. It has to use an index in order to perform properly. The entries in the index are sorted. SQL uses the sort sequence with “trailing” wildcards, to get to the first match quickly, and then ignore any entries past the last match.

SQL cannot use an index to matching entries when there is a leading wildcard. As a consequence it will probably scan the whole table. This will work but it may take longer than you want.

The problem gets worse for tables with more rows because with a leading wildcard LIKE has to test every row.

There will be times when you need to use a leading wildcard it in a query. Go ahead and use it, just expect poor performance. If this query which will be used repeatedly, or in a program, then discuss what you are trying to do with the people who look after your database.

Summary

Phil Factor describes leading wildcards with LIKE as an SQL Smell. This may be unavoidable in some ad-hoc queries, but you should treat it with caution if you intend to reuse the query or use it with large tables.

Where next?

The next, and final, blog post is on “Floating Point Numbers”. Do you use floating point numbers in your database? Phil Factor thinks this is an SQL Smell too. Find out why in the next article.

SELECT * – SQL Smell

“SELECT *” What could possibly be wrong with that? Everybody uses SELECT *, don’t they?

Phil Factor describes this as an SQL Smell, and it is. Finding it should immediately make you suspicious. The problem is not the statement, it’s where you find it!

Interactive and Batch SQL

You can use SQL interactively, in “batch”, and inside programs. One of the good things about SQL is that it looks pretty much the same wherever you find it.

“SELECT *” is intended to be used interactively. That’s how I use it, and I expect Phil Factor does the same. Typing the statement in the figure at a command line, or inside a development environment like SSMS is completely appropriate.

Some people create queries interactively using “SELECT *” as a starting point. That’s legitimate too. It’s a matter of personal style.

Don’t use this form of SELECT in a program or when you expect to reuse it. If you save the file, you shouldn’t be using “SELECT *”.

Why Using “SELECT *” is a problem

Sometimes we want things to break! We want something to fail before something worse happens.

You can change the design of tables in a database. One way is using the ALTER statement. Columns can be added and removed.

“SELECT *” will continue to return a result even when the tables it is using have changed significantly. This is a problem because we don’t know if it is still doing what we originally intended!

Legitimate uses of SELECT *

There are a few ways you can use an asterisk in a SELECT statement without taking a risk. That is when you are checking if something exists, or counting the number of rows. In both cases the columns of the tables are irrelevant.

Summary

Phil Factor identifies “SELECT *” as an SQL Smell. It can be used interactively, but almost anywhere else it has the potential to cause problems.

Where next?

Do you use “LIKE” in searches? There times when Phil Factor thinks this is an SQL Smell too. Find out why in the next article.

DateTime, Date and Time – SQL Smells

Phil Factor identifies several SQL Smells associated with the use of DateTime, Date and Time data-types. Using the wrong types will waste space, harm performance and create “odd” behaviour.

If you are clear about what you are recording, you will avoid these issues. I prefer to say what I want, and let an expert choose the best date-types. In other words, I prefer to separate the analyst and designer roles. That way I avoid suggesting the wrong types.

DateTime, Date or Time – Which do you need?

DateTime columns which contain only Date or Time contribute two of SQL Smells. This wastes space, both in storage and in memory (which will degrade performance).

There is something worse here. Using DateTime in this way suggests we are not clear about what we want. This lack of clarity encourages the designer to “hedge their bet” by using DateTime.

How precise do you need to be?

Business Systems need to record dates and times, but they don’t need great precision. For many business transactions, the nearest second or even minute is adequate. In some cases recording extra precision can be misleading.

Storing Durations

For a “Period” you need to know 3 things: Start, End and Duration. You only need to store two out of the 3. The third value can always be calculated if you have the other two.

Storing the “start” and “end” is more flexible. It is straightforward to work out if an event took place within a given period.

Telling if periods overlap is easy too.

If you decide to store a duration you must specify the units you intend and the precision you need. (As Phil puts it “milliseconds? Quarters? Weeks?”). Do not be tempted to store a duration in a “Time” data-type! “Time” is intended for “Time-of-day” not duration.

Dates and Times: Choosing the right data-type – Some simple questions

Choosing appropriate data-types for dates and times is not difficult, if you go about it the right way.

Divide the job into two steps: “The Requirement” and the “The Most suitable technical data-type”. Do the two steps separately, taking into account any local standards and conventions.

The Requirement

1. Is this an “Event” (a “Point in Time”) a “Period” or a “Duration” (both have beginning and an end)?
2. Event: What is the best way to represent this data? Should it be a “Date” or a “Time”? Does it really need “Date-time”?
3. Duration: What are the units and scale of this duration?
4. How precise do you need the value to be? You may surprise yourself.
5. How long do you expect this system (and the data in it) to last?

The Most Suitable Technical Data-type

Use the table (based on Microsoft’s Documentation) to choose the best data-type for your needs. This table is for SQL Server. Other database managers will have similar tables.

Microsoft recommends using DATETIME2 in preference to DATETIME for new work (it takes up less space and is more accurate). Providing the maximum date is acceptable, I would consider SMALLDATETIME for business transactions (but you do risk creating a “Year 2000 problem” if the data turns out to have a long life).

If your system will span several time-zones, then you should definitely consider the benefits of using the DATETIMEOFFSET data-type.

Summary

The DateTime, Date and Time data-types can all cause SQL Smells when they are used inappropriately. Problems can be avoided by following some simple guidelines.

Where next?

Phil Factor doesn’t like “SELECT *”. Find out why in the next article.

Same column name in different tables- SQL Smells

Here’s another of Phil Factor’s SQL Smells. “Using the same column name in different tables but with different data-types”.

At first glance this seems harmless enough, but it can cause all sorts of problems. All of them are avoidable. If you are an analyst, make sure you are not encouraging this to happen. If you creating the Physical Model or DDL for a database, “Just don’t do it!”

As Phil Factor says, this is: “an accident waiting to happen.”

Two “rights” can make a “wrong”!

The problem here is not “using the same column name in different tables”. That is perfectly ok. Similarly, using “different data-types for different columns” cannot be wrong. That’s exactly what you should expect to do.

The problem is doing both at the same time. The issues are: practical and technical.

The Practical Problem of the same column name with different data-types

Any human user is likely to think that the same name refers to the same type of thing. They won’t check that the definitions of both “names”. No amount of “procedures” or “standards” will make them do anything different.

Sooner or later this will cause one of the technical problems.

The Technical Problems from the same column name with different data-types

Technical problems will occur when a value moves from one column to the other, or when comparing the two columns. Data may be truncated and those data transformations cost effort.

These problems may not manifest themselves immediately. The consequences will be data-dependent bugs and poor performance.

The Solution to this Issue

This smell and the associated problems can be avoided by following some simple rules:

1. If two columns refer to the same thing (like a foreign key and a primary key), make sure they are the same data type.
2. If two columns refer to different things, then give them distinct names. (Do not resort to prefixing every column name with the table name. That’s horrible)
3. Having columns with have different names and the same data-type is perfectly OK.

Summary

“Using the same column name in different tables with different data-types” in an SQL database is simply “an accident waiting to happen.” It is easily avoided. Don’t do it and don’t do anything to encourage it.

Where next?

The next article is about the smells which come from dates and times.