RE: Database Design Question

From: Joel Burton <jburton(at)scw(dot)org>
To: Jimmie Fulton <JFulton(at)ehso(dot)emory(dot)edu>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: RE: Database Design Question
Date: 2001-07-28 03:22:50
Message-ID: Pine.LNX.4.21.0107272316430.18148-100000@olympus.scw.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Fri, 27 Jul 2001, Jimmie Fulton wrote:

> I'm by no means a crack database designer, but I do have my ideas on this
> subject. I prefer that every table has a unique integer (auto-incrementing)
> primary key. Why? Consistency. With the alternative, some tables may not
> have a clear-cut candidate for a unique id. In your example, you had
> "customer". How many "John Smith"s would it take before we decide that is
> not a good identifier. On the other hand, some tables would have perfectly
> logical identifiers. Part numbers, SSNs.... So, you would need to create
> some tables with integer primary keys, and others would have some other
> natural identifier. That to me is inconsistent. Every table should be,
> IMHO, predictable in it's definition of a primary key. I don't even have to
> guess what the names of my primary keys are either because the are all named
> <tablename>_ID. Always. I've only come up with these thoughts on my own,
> and have not extensively tried the other way, so I'd be interested in
> hearing other's ideas for the other side.

For large scale databases, there are theories aplenty about proper naming,
etc., and these should be be investigated. For small/simple databases,
this might be overkill in complexity and learning curve.

I teach a series of classes on small-scale database design for nonprofit
organizations, and recommend some simple rules:

* for tables where there is no candidate key (ie a Person table where
there is no SSN, etc.), use the table name + "id", and use a SERIAL-type.

* for tables where there is a candidate key, and that candidate key meets
all the usual criteria (always present, always unique, changes very
rarely), use the table name + "code", and use the appropriate type
(text/int/whatever), UNLESS

* there exists a very common name for this piece of info. Rather than
calling a SSN a "personcode" (in the above example), call it the SSN

* always put the primary key first in the table

Why not always use a SERIAL integer? It's a pain when a perfectly good
value exists. For example, in a table keeping track of US states, their
capitals, and governors, does it really make sense to code Maryland as
"45", when a memorable, unique, unlikely-to-change code already exists
("md")? Using a random number when a real-world code could do only forces
your user to do that lookup themselves.

[apologies to the international readers: Maryland is a state in the USA,
and "MD" is the postal code abbreviation for it]

I think that you could make some basic rules that would give you a system
that is intuitive and easy-to-rememeber, _without_ sacrificing clarity for
rigidity.

hth,
--
Joel Burton <jburton(at)scw(dot)org>
Director of Information Systems, Support Center of Washington

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Stephan Szabo 2001-07-28 03:48:44 Re: Why does this plpgslq always return 1?
Previous Message John Oakes 2001-07-28 03:00:21 Why does this plpgslq always return 1?