How to “dig into a large database”?

This posting was prompted by a question on one of the BA forums on Linked In. The original question was:


I gave an answer in the forum, but here is an expansion, and some ponderings.

First of all, let’s set out the terms of reference. The question asks about a “large unfamiliar” database. I think we can assume that “unfamiliar” is “one that we haven’t encountered before”, but what is “LARGE”? To me “large” could be:  
  •  Lots of tables
  • Many terror-bytes 😉
  • Lots of transactions
  • Lots of users
  • There may be other interpretations

I’m going to go with “Lots of tables” with the definition or “lots of” being:

“more than I can conveniently hold in my head at one time”

I’ve also assumed that we are working with a “transactional database” rather than a “data warehouse”.


Gilian, the questioner was given some good suggestions, which I summarised as “Collecting Information” or perhaps “Preparation”:
  • Understand objectives of “The Business”
  • Understand the objectives of “This Project” (Digging into the Database)
  • Collect relevant Organisation charts and find out who is responsible for doing what
  • Collect relevant Process Models for the business processes which use the database
  • Get hold of, generate, or otherwise create a data model (Entity Relationship Diagram or similar)

Of these, the one which is specific to working with a Database is the ERD. Having a diagram is an enormous help in visualising how the bits of the database interact.


For me, the next step is to divide the model into “chunks” containing groups of entities (or tables). This allows you to:  
  • Focus – on one chunk
  • Prioritise – one chunk is more important, interesting or will be done before, another
  • Estimate – chunks are different sizes
  • Delegate – you do that chunk, I’ll do this one
  • And generally “Manage” the work;  do whatever are the project objectives.

I would use several techniques to divide the database or model up into chunks. These techniques work equally well with logical and physical data models. It can be quite a lot of work if you have a large model. None of the techniques are particularly complicated, but they are a little tricky to explain in words.

Here is a list of techniques: 
  • Layering
  • Group around Focal Entities
  • Process Impact Groups
  • Realigning

Organise the Data Model

I cannot over-emphasis how important it is to have a well-laid out diagram. Some tools do it well, some do it less well. My preference is to have “independent things” at the top.


I’ve invented a business.
  • We take ORDERs from CUSTOMERs. 
  • Each ORDER consists of one or more ORDER_LINES and each line is for a PRODUCT.
  • We Deliver what the customer wants as DELIVERY CONSIGNMENTS. 
  • Each CONSIGNMENT contains one or more Batches of product (I’ve haven’t got a snappy name for that).
  • We know where to take the consignment by magic, because we don’t have an Address for the Customer!
  • We reconcile quantities delivered against quantities ordered, because we sometimes have to split an order across several deliveries.
  • That’s it!


“Layering” involves classifying the entities or groups of entities as being about:
  • Classifications
  • Things
  • Transactions
  • Reconciliations


Let’s start with “Things”. Things are can be concrete or they can be abstract. We usually record a “Thing” because it is useful in doing our business. Examples of Things are:
  • People
  • Organisations
  • Products
  • Places
  • Organisation Units (within our organisation, or somebody elses)


Every business has endless ways of classifying “Things” or organising them into hierarchies. I just think of them as fancy attributes of the “Things” unless I’m studying them in their own right.
Note: “Transactions” can have classifications too (in fact almost anything can and does), I’ve just omitted them from the diagram!
Note: The same structure of “Classification” can apply to more than one thing. This makes sense if, for example, the classification is a hierarchy of “geographic area”. Put it in an arbitrary place, note that it belongs in other places as well, and move on!


Transactions  are what the business is really interested in. They are often the focus of Business Processes.
  • Order
  • Delivery
  • Booking


Where there are parts of Transactions (eg Order_Line) keep the child with the parent.


Reconciliations” (between Transactions) occur when something is “checked against something else”. In this case we are recording that “6 widgits have been ordered” and that “3 (or 6) have been delivered”.
 If you use these “layers”, arranged as in the diagram,  you will very likely find that the “One-to-manys” point from the top (one) down (many) the page.

Groups around Focal Entities

To do this, pick an entity which is a “Thing” or a “Transaction” then bring together the entities which describe it, or give more detail about it. Draw a line round it, give it a name, even if only in your head!
  • “Customer and associated classifications” and
  • “Order and Order_line” are candidate groups.


Process Impact Groups

To create a “Process Impact Group”
  • Select a business process
  • Draw lines around the entities which it: creates, updates and refers to as part of doing its work.
  • You should get a sort of contour map on the data model. 


In my example the processes are: 
  • Place Order
  • Assemble Delivery Consignment
  • Confirm Delivery (has taken place)


It is normal for there to be similarities between “Process Impact Groups” and “Focal Entity Groups”.  In fact, it would be unusual if there were not similarities!


Try moving parts under headers (so, Order_line under Order) and reconciliations under the transaction which causes them. In the diagram, I’ve moved “Delivered Order Line” under “Delivery”, because it’s created by “Delivery related processes” rather than when the Order is created.

Finally, “Chunking”

Based on the insights you have gained from the above, draw a boundary around your “chunks”.
The various techniques are mutually supportive, not mutually exclusive. The chunks are of arbitrary size. If it is useful, you can: 
  • combine neighbouring chunks together or
  • you can use the techniques (especially “Focal entities” and “Process Entity Groups”) to break them down until you reach a single table/entity.



My preferred tools for doing this are: a quiet conference room, late at night; the largest whiteboard I can find; lots of sticky Post-its or file cards (several colours); a pack of whiteboard pens; black coffee on tap and the prospect of beer when I’ve finished. For a large database (several hundred tables) it can take several days!
Once you’ve done all this, then all you have to do, is do the work!  


 I hope you have as much fun doing it as I had writing about it! J

2 thoughts on “How to “dig into a large database”?”

  1. Thanks Phlip, I'm glad you found it interesting and hope you find it useful. I'm trying to make regular postings here, but as always, "the project" keeps getting in the way!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.