LIKE with an initial wildcard character – SQL Smell

Leading wildcards with LIKE an SQL Smell
Leading wildcards with LIKE are an SQL Smell

Phil Factor says “using LIKE in a WHERE clause with an initial wildcard character” is an SQL Smell, but why? Why does the position of the wildcard make a difference?

This really is a problem, and it gets more serious with larger tables.

I’m going to explain what the problem is from an analyst’s point of view.

What LIKE does for us

Like in the SQL WHERE clause
Like in the SQL WHERE clause

LIKE identifies rows which match the criteria. It can be used with character columns and combined with other conditions.

The first time I encountered this statement I was amazed that it worked at all! This simply demonstrates how SQL protects us from what is really happening underneath the surface.

Like with a trailing wildcard using an index
Like with a trailing wildcard using an index

Appearances can be deceptive. LIKE only appears to match the search argument against the contents of the table column. It has to use an index in order to perform properly. The entries in the index are sorted. SQL uses the sort sequence with “trailing” wildcards, to get to the first match quickly, and then ignore any entries past the last match.

The problem with leading wildcards

Valid LIKE criteria - But two may cause problems
Valid LIKE criteria – But two may cause problems

SQL cannot use an index to matching entries when there is a leading wildcard. As a consequence it will probably scan the whole table. This will work but it may take longer than you want.

The problem gets worse for tables with more rows because with a leading wildcard LIKE has to test every row.

There will be times when you need to use a leading wildcard it in a query. Go ahead and use it, just expect poor performance. If this query which will be used repeatedly, or in a program, then discuss what you are trying to do with the people who look after your database.

Summary

Phil Factor describes leading wildcards with LIKE as an SQL Smell. This may be unavoidable in some ad-hoc queries, but you should treat it with caution if you intend to reuse the query or use it with large tables.

Where next?

The next, and final, blog post is on “Floating Point Numbers”. Do you use floating point numbers in your database? Phil Factor thinks this is an SQL Smell too. Find out why in the next article.

SELECT * – SQL Smell

Is SELECT * FROM Table an SQL Smell?
Is SELECT * FROM Table an SQL Smell?

“SELECT *” What could possibly be wrong with that? Everybody uses SELECT *, don’t they?

Phil Factor describes this as an SQL Smell, and it is. Finding it should immediately make you suspicious. The problem is not the statement, it’s where you find it!

Interactive and Batch SQL

SQL can be used: Interactively, in Batch files and in Programs
SQL can be used: Interactively, in Batch files and in Programs

You can use SQL interactively, in “batch”, and inside programs. One of the good things about SQL is that it looks pretty much the same wherever you find it.

“SELECT *” is intended to be used interactively. That’s how I use it, and I expect Phil Factor does the same. Typing the statement in the figure at a command line, or inside a development environment like SSMS is completely appropriate.

Some people create queries interactively using “SELECT *” as a starting point. That’s legitimate too. It’s a matter of personal style.

Don’t use this form of SELECT in a program or when you expect to reuse it. If you save the file, you shouldn’t be using “SELECT *”.

Why Using “SELECT *” is a problem

SELECT * will continue to work even if columns are removed and added!
SELECT * will continue to work even if columns are removed and added!

Sometimes we want things to break! We want something to fail before something worse happens.

You can change the design of tables in a database. One way is using the ALTER statement. Columns can be added and removed.

“SELECT *” will continue to return a result even when the tables it is using have changed significantly. This is a problem because we don’t know if it is still doing what we originally intended!

Legitimate uses of SELECT *

Legitimate, safe uses of SELECT *
Legitimate, safe uses of SELECT *

There are a few ways you can use an asterisk in a SELECT statement without taking a risk. That is when you are checking if something exists, or counting the number of rows. In both cases the columns of the tables are irrelevant.

Summary

Phil Factor identifies “SELECT *” as an SQL Smell. It can be used interactively, but almost anywhere else it has the potential to cause problems.

Where next?

Do you use “LIKE” in searches? There times when Phil Factor thinks this is an SQL Smell too. Find out why in the next article.

Same column name in different tables- SQL Smells

Two Columns: Same Name, Different Data-type = Nasty! SQL Smell
Two Columns: Same Name, Different Data-type = Nasty! SQL Smell

Here’s another of Phil Factor’s SQL Smells. “Using the same column name in different tables but with different data-types”.

At first glance this seems harmless enough, but it can cause all sorts of problems. All of them are avoidable. If you are an analyst, make sure you are not encouraging this to happen. If you creating the Physical Model or DDL for a database, “Just don’t do it!”

As Phil Factor says, this is: “an accident waiting to happen.”

Two “rights” can make a “wrong”!

The problem here is not “using the same column name in different tables”. That is perfectly ok. Similarly, using “different data-types for different columns” cannot be wrong. That’s exactly what you should expect to do.

The problem is doing both at the same time. The issues are: practical and technical.

The Practical Problem of the same column name with different data-types

Any human user is likely to think that the same name refers to the same type of thing. They won’t check that the definitions of both “names”. No amount of “procedures” or “standards” will make them do anything different.

Sooner or later this will cause one of the technical problems.

The Technical Problems from the same column name with different data-types

Technical problems will occur when a value moves from one column to the other, or when comparing the two columns. Data may be truncated and those data transformations cost effort.

These problems may not manifest themselves immediately. The consequences will be data-dependent bugs and poor performance.

The Solution to this Issue

This smell and the associated problems can be avoided by following some simple rules:

  1. If two columns refer to the same thing (like a foreign key and a primary key), make sure they are the same data type.
  2. If two columns refer to different things, then give them distinct names. (Do not resort to prefixing every column name with the table name. That’s horrible)
  3. Having columns with have different names and the same data-type is perfectly OK.

Summary

“Using the same column name in different tables with different data-types” in an SQL database is simply “an accident waiting to happen.” It is easily avoided. Don’t do it and don’t do anything to encourage it.

Where next?

The next article is about the smells which come from dates and times.

Check Constraint to limit values – SQL Smell?

Do Check Constraints Cause an SQL Smell?
Do Check Constraints Cause an SQL Smell?

In his article on SQL Smells, Phil Factor is not in favour of using Check Constraint to limit the values in columns.

In my previous post I explained why Phil Factor recommends referential integrity. I am going to explain this apparent contradiction.

What are CHECK CONSTRAINTs and what do they do for us?

Using CHECK CONSTRAINT to Restrict Allowed Values
Using CHECK CONSTRAINT to Restrict Allowed Values

Like REFERENCES, the CHECK CONSTRAINT also says what are values can be stored in a column. The logical expression must be true for the value to be allowed.

The logical expression can be limits or even a list of permitted values (as in the illustration).

What is the problem with Check Constraints?

This sounds like a marvellous idea! The benefit is clear. Constraints will exclude invalid data from the database, even when it is loaded using a utility (eg BULK INSERT). We can define constraints which will protect the database from bad data. So what is the problem?

Why does Phil Factor regard this as an SQL Smell? The answer is that using Check Constraints in this way embeds constant values in the structure of the SQL database!

The database structure is “locked down” in commercial systems. Only authorised people are allowed to make changes to the structure. Phil Factor wants to avoid these changes.

When should we use Check Constraints?

This criticism does not mean that we should never use Check Constraints. The ideal constraint will not change in the lifetime of the database.

Possible candidates for using CHECK CONSTRAINTs to Restrict Values
Possible candidates for using CHECK CONSTRAINTs to Restrict Values
  • For attributes which represent “classifications” and “types” we should note how many different values we are expecting, and how frequently the allowed values change. Short lists which change very rarely may be acceptable.
  • On the other hand, consider re-designing a CHECK CONSTRAINT as a FOREIGN KEY by adding an additional table to contain the valid values. This has the benefit of making adding a new value a simple data change!
  • Do not use Check Constraints to enforce arbitrary limits.

Check Constraint and Requirements

We can identify candidates for Check Constraint when we construct the Conceptual Model. We should note:

  • The number of options
  • The expected frequency of change.
  • That information will enable us to make an informed decision about how to validate the values of that column.

Unfortunately, some of the examples using Check Constraints perform the checks against arbitrary values. These examples will work technically but copying them may cause the problems Phil Factor wants us to avoid.

Summary

Check Constraints provide a way of validating data values. They are appropriate for checking against values which do not change.

For lists, lookup-tables with Foreign Key Constraints may be better.

Do not use Check Constraints against arbitrary values or values which change frequently.

Where next?

The next article covers “Indexes”. I will explain how an Analyst can influence some design decisions.

Not using Referential Integrity – An SQL Smell!

Not Using Referential Integrity - SQL Smell?
Not Using Referential Integrity – SQL Smell?

In his article on SQL Smells, Phil Factor describes identifies two “smells” associated with Constraints in SQL. He regards not using Referential Integrity as an SQL Smell.

In this post I will explain what a Constraint is, and how one kind of Constraint provides Referential Integrity.

What are CONSTRAINTs and what do they do for us?

A Constraint limits the values which are allowed in a column. The SQL database manager will use constraints to validate data. There are several types of constraint.

Phil Factor thinks that not enforcing Referential Integrity is usually bad (and I agree).

Referential Integrity and the FOREIGN KEY REFERENCES Constraint

The FOREIGN KEY REFERENCES constraint on a column makes it a Foreign Key to another table. The Foreign Key must be present as a Primary Key in the referenced table. This is called Referential Integrity.

In the illustration, it is not possible for the Order table to contain a CustomerID which does not exist in the table Customer.

Enforcing Referential Integrity with REFERENCES
Enforcing Referential Integrity with REFERENCES

This is very powerful. The constraint makes the database manager check every change. Statements which would violate the Foreign Key constraint are rejected.

Performing the validation costs effort.

Phil Factor points out an additional benefit. The database manager may be able to use the Foreign Keys to improve performance of some queries.

Why wouldn’t you use Referential Integrity?

The arguments most frequently used for not implementing referential integrity are:

  • Performance: I have already mentioned the “cost” of checking the constraints. This argument:
    • Prejudice against logic in the database
    • A desire to minimise the “load” on the database manager.
  • Dirty Data: An “upstream” system may provide data which violates the constraint. Accepting this argument may mean importing faulty data, and the associated problems into your system!

“Staging tables” may be a better solution to this problem.

Both of these arguments are valid in some circumstances. There are times alternatives are better. Ask the following questions:

  • How are you going to perform the validation?
  • Which system components are going to perform the validation?
  • Do you expect the alternative solution to perform better?
  • Will the alternative solution be better in some way?

Not having answers to these questions is not really acceptable. Try and make a rational decision, based on numbers.

How Referential Integrity and Requirements interact

The Conceptual Data Model identifies business entities and relationships. Those relationships define the referential integrity requirements.

Not using referential integrity implies that:

  • The system is going to allow invalid data, or
  • The system is going to validate the data in some other way.

Summary

Foreign Key Constraints exclude some invalid data from the system. Orphaned records (eg non-existent Customers) are impossible.

Exceptions to this rule require a rational justification.

Where next?

In the next article I will look at “Check Constraints” which are a different way of ensuring valid data.