How to “dig into a large database”?

This posting was prompted by a question on one of the BA forums on Linked In. The original question was:

 

I gave an answer in the forum, but here is an expansion, and some ponderings.

First of all, let’s set out the terms of reference. The question asks about a “large unfamiliar” database. I think we can assume that “unfamiliar” is “one that we haven’t encountered before”, but what is “LARGE”? To me “large” could be:  
  •  Lots of tables
  • Many terror-bytes 😉
  • Lots of transactions
  • Lots of users
  • There may be other interpretations

I’m going to go with “Lots of tables” with the definition or “lots of” being:

“more than I can conveniently hold in my head at one time”

I’ve also assumed that we are working with a “transactional database” rather than a “data warehouse”.

Preparation

Gilian, the questioner was given some good suggestions, which I summarised as “Collecting Information” or perhaps “Preparation”:
  • Understand objectives of “The Business”
  • Understand the objectives of “This Project” (Digging into the Database)
  • Collect relevant Organisation charts and find out who is responsible for doing what
  • Collect relevant Process Models for the business processes which use the database
  • Get hold of, generate, or otherwise create a data model (Entity Relationship Diagram or similar)

Of these, the one which is specific to working with a Database is the ERD. Having a diagram is an enormous help in visualising how the bits of the database interact.

Chunking

For me, the next step is to divide the model into “chunks” containing groups of entities (or tables). This allows you to:  
  • Focus – on one chunk
  • Prioritise – one chunk is more important, interesting or will be done before, another
  • Estimate – chunks are different sizes
  • Delegate – you do that chunk, I’ll do this one
  • And generally “Manage” the work;  do whatever are the project objectives.

I would use several techniques to divide the database or model up into chunks. These techniques work equally well with logical and physical data models. It can be quite a lot of work if you have a large model. None of the techniques are particularly complicated, but they are a little tricky to explain in words.

Here is a list of techniques: 
  • Layering
  • Group around Focal Entities
  • Process Impact Groups
  • Realigning

Organise the Data Model

I cannot over-emphasis how important it is to have a well-laid out diagram. Some tools do it well, some do it less well. My preference is to have “independent things” at the top.

 

I’ve invented a business.
  • We take ORDERs from CUSTOMERs. 
  • Each ORDER consists of one or more ORDER_LINES and each line is for a PRODUCT.
  • We Deliver what the customer wants as DELIVERY CONSIGNMENTS. 
  • Each CONSIGNMENT contains one or more Batches of product (I’ve haven’t got a snappy name for that).
  • We know where to take the consignment by magic, because we don’t have an Address for the Customer!
  • We reconcile quantities delivered against quantities ordered, because we sometimes have to split an order across several deliveries.
  • That’s it!

Layering

“Layering” involves classifying the entities or groups of entities as being about:
  • Classifications
  • Things
  • Transactions
  • Reconciliations

Things

Let’s start with “Things”. Things are can be concrete or they can be abstract. We usually record a “Thing” because it is useful in doing our business. Examples of Things are:
  • People
  • Organisations
  • Products
  • Places
  • Organisation Units (within our organisation, or somebody elses)

Classifications

Every business has endless ways of classifying “Things” or organising them into hierarchies. I just think of them as fancy attributes of the “Things” unless I’m studying them in their own right.
Note: “Transactions” can have classifications too (in fact almost anything can and does), I’ve just omitted them from the diagram!
Note: The same structure of “Classification” can apply to more than one thing. This makes sense if, for example, the classification is a hierarchy of “geographic area”. Put it in an arbitrary place, note that it belongs in other places as well, and move on!

Transactions

Transactions  are what the business is really interested in. They are often the focus of Business Processes.
  • Order
  • Delivery
  • Booking

 

Where there are parts of Transactions (eg Order_Line) keep the child with the parent.

Reconciliations

Reconciliations” (between Transactions) occur when something is “checked against something else”. In this case we are recording that “6 widgits have been ordered” and that “3 (or 6) have been delivered”.
 If you use these “layers”, arranged as in the diagram,  you will very likely find that the “One-to-manys” point from the top (one) down (many) the page.

Groups around Focal Entities

To do this, pick an entity which is a “Thing” or a “Transaction” then bring together the entities which describe it, or give more detail about it. Draw a line round it, give it a name, even if only in your head!
  • “Customer and associated classifications” and
  • “Order and Order_line” are candidate groups.

 

Process Impact Groups

To create a “Process Impact Group”
  • Select a business process
  • Draw lines around the entities which it: creates, updates and refers to as part of doing its work.
  • You should get a sort of contour map on the data model. 

 

In my example the processes are: 
  • Place Order
  • Assemble Delivery Consignment
  • Confirm Delivery (has taken place)

 

It is normal for there to be similarities between “Process Impact Groups” and “Focal Entity Groups”.  In fact, it would be unusual if there were not similarities!

Realigning

Try moving parts under headers (so, Order_line under Order) and reconciliations under the transaction which causes them. In the diagram, I’ve moved “Delivered Order Line” under “Delivery”, because it’s created by “Delivery related processes” rather than when the Order is created.

Finally, “Chunking”

Based on the insights you have gained from the above, draw a boundary around your “chunks”.
The various techniques are mutually supportive, not mutually exclusive. The chunks are of arbitrary size. If it is useful, you can: 
  • combine neighbouring chunks together or
  • you can use the techniques (especially “Focal entities” and “Process Entity Groups”) to break them down until you reach a single table/entity.

 

Tools

My preferred tools for doing this are: a quiet conference room, late at night; the largest whiteboard I can find; lots of sticky Post-its or file cards (several colours); a pack of whiteboard pens; black coffee on tap and the prospect of beer when I’ve finished. For a large database (several hundred tables) it can take several days!
Once you’ve done all this, then all you have to do, is do the work!  

 

 I hope you have as much fun doing it as I had writing about it! J

Security – Using Server Side security with MS SQL Server

There are times when I want to do something, and it gives rise to questions. Often I have to put the questions to one side in order to get on with the thing which is my immediate priority. When that happens, I put the questions to one side with the intention of coming back to them in the future. This is one of the occasions when I have had the opportunity to go back and answer some of the questions.
The questions in this case were:
  • How does Server-side security work on MS SQL Server? And
  • How can I control the access different users have to my database?

I’m not (have never been, and do not really intend to become) a DBA (Database Administrator). I understand a bit about “privileges” but in the past I’ve always had someone else “doing it for me”, and in any case , I’ve worked on other databases such as DB/2 or Oracle.

After a little bit of research and a bit of experimentation (fiddling around), I found the answer to my questions. Although it was hardly earth-shattering, the understanding is satisfying.
The key is to understand that in MS SQL Server the “Server Instance” and the “Database” are separate entities. It is possible to have several databases inside the same Server instance. In fact I do it all the time when I’m experimenting. This means that there are two separate things to define:
  • A LOGIN, which gives access to the Server Instance, and with Server-side authentication, provides the security, and
  • A USER, which belongs to the Database and is granted the Database privileges (including CONNECT).
Figure 1 Server-side security for MS SQL Server
Summary of the stages: 
  1. Ensure that SQL Server is set up to allow Server Authentication
  2. Create the LOGIN (in the SQL Server Instance)
  3. Create a USER corresponding with the LOGIN (in the Database)
  4. Grant the USER CONNECT privilege (happens by default, but can be revoked)
  5. Grant the USER the appropriate privileges on the Database
If you’re interested in taking this a bit further, I’ve summed all this up in a video on YouTube.

What is the point of Unit Testing?

I don’t normally add two entries to my blog on the same day but something attracted my interest. Somebody asked the question: “What is the point of Unit Testing?”
Actually, they asked two questions:
  • Why do we perform Unit Testing, even though we are going to do System testing?
  • What are the benefits of Unit Testing?
To which my immediate thought responses were:
  • Would you deliberately make something from parts you knew were broken? Or
  • Would you make something from parts which you suspected were broken?
And then I thought “that might sound a little rude” and reconsidered…
Modern development methods have lots of benefits, but sometimes in flexible and rapid methods something gets lost. People forget why things are done. Or, if they’ve never been told, they wonder if they are worth bothering with.
Now, you should always question everything, but sometimes things are there for a good reason. If you plan to take something away;
  • Understand why it was there in the first place,
  • If it is no longer needed, explain why it is no longer needed.
  • Understand (and be prepared to live with) the consequences of taking it away.

An old-fashioned view of a System Development Process

(by the way, you’ll notice that some of this material has been re-cycled from elsewhere)
If we take a rather old fashioned view of systems development using a “waterfall” model, then we will have a number of phases (an old IBM Development Process, but does n’t really matter).
  • Each phase produces something, and the stage below it expands it (produces more “things”) and adds detail.
  • The “Requirements” specify what things the system needs to do. They also identify the things that need to be visible on the surface of the system.
  • For each of the things that need to be visible on the surface, we need an “External Design”
  • The “External Design” specifies the appearance and functional behaviour of the system.
  • For everything we have in the “External Design” we need a design for the “Internals”
  • And finally someone needs the “Build” what we have specified.
You can view this as a waterfall, down the side of a valley. The process is one of decomposition.
I don’t especially recommend “Waterfall” as a way of running a development project, but it is a simple model which is useful as an illustration.

The Testing Process

On the other side of the valley we build things up.
  • Units are tested.
  • When they work they are aggregated into “Modules” or “Assemblies” or “Subsystems”, which are tested.
  • These assemblies are assembled into the System which is tested as a whole.
  • Finally the System is tested by representatives of the Users.
The process is one of developing “bits”, testing the bits and then assembling the bits and then testing the assembly.
The assembly process (in the sense of “putting things together”, not compiling a file written in “assembler”) costs time and effort. Parts are tested as soon as practical after they are created and are not used until they conform to their specification. The benefit is that we always working with things that we think work properly.
In a well-organised world, you would like to think that the Users are testing against the original requirements!

Development and Testing should be mutually supportive

What should happen is that at every level, each component or assembly should have some sort of specification (it may be a very rudimentary specification, but it should still exist) and it should be tested against that.
In fact, there is a thoroughly respectable development approach called “Test Driven Development”. The idea here is that the (Business) Analyst writes a “Test” which can be used to demonstrate that the system, at whatever level, is doing what it is supposed to be doing. Of course, the Analyst may need help to write an automated test, but the content should come from the Analyst.
This approach is really useful all the way through the development process. It’s a really good idea if a developer writes tests for the code s/he is writing before the code! In fact, I have known places where they insisted that a test was written for a bug before the developer attempted to fix the bug. That way demonstrating the fix was easy: Run the test without the fix – Test demonstrates the bug. Apply the fix and run the test again.

The Cost of Not Doing Unit (or other low-level) Testing

All bugs are found at the topmost level, which means that they are found after the product has been assembled or “built” and then we have to work out where the error has actually originated.

The Benefits of Unit Testing

  • Bugs are found sooner, and they are found closer to the point at which they are created.
  • Unit testing lends itself to automated testing which can be integrated with the build process. Ask a professional Java developer about “JUnit” or a Python developer about “UnitTest” (one word).
  • Automated testing increases the chances of trapping “regression bugs” as code is enhanced and bugs are fixed.
All of the above mean that well-planned and executed Unit Testing results in:
  • Reduced overall cost
  • Improved product quality

Oracle and Courses

My personal development time last week was spent completing an online course “Oracle DBA for absolute beginners”.

I wouldn’t have described myself as an “absolute beginner”, but I found plenty to enjoy in the course and came away having learned quite a bit about what is going on inside Oracle, and I assume most other database managers.

Circumstances influence what we do in life and so far I have had much more exposure to DB/2 and MS SQL Server than to Oracle. That hasn’t been a decision on my part, simply the choices that had been made for the projects I was involved in.

In a similar way, I’ve spent much more time “dealing with users” as a Business Analyst, than I have working out how to manage the space requirements and performance of a database. It does me good to learn just a little about the things a DBA has to consider. I don’t have to let those considerations govern what I consider the requirements to be, but at least I can understand where other people are coming from.

Taking the course led me to what you might consider “meta” thinking: thinking about not the content of the course, but the way it was presented and the platform Udemy on which it was presented.

I find Udemy interesting. It seems to work well. It certainly worked for me.

Udemy seem to be aiming to be a “neutral marketplace”. The courses belong to the course instructors. Of course Udemy have standards for courses, but beyond the usual “fit to print” conditions, they are mostly technical standards (quality of video and sound) rather than subject matter related. In a similar spirit, Udemy promote the platform, but the promotion I have seen seems to be fairly neutral with regard to individual courses. On the other hand, instructors or course owners are completely free to advertise their wares elsewhere and direct potential customers into Udemy. It’s a simple model which I think I will investigate further.

Another video – Splitting an Access database

When I’m making something I sometimes learn new things. They say “you should never stop learning”. I agree with “them”, whoever they may be.

While I was working on the SOPAG project, I investigated “splitting” the Access database into:

  • Logic and presentation, and
  • Data (database definitions and data values)

Components. I knew this could be done, but I had not spent much effort on it before.

The splitting itself was a straightforward enough exercise. Most of the work is done by Access itself. However, you might want to confirm that all the decisions it has made are sensible!

I decided to document the results for my own benefit, and then decided to convert a scrappy Powerpoint presentation into something a little more presentable to upload to YouTube.

Here it is: Splitting an Access database

I had fun doing the work to find out how it worked, and fun making the video. I hope you get something from watching it.