Re: Surrogate keys (Was: enums)

From: mark(at)mark(dot)mielke(dot)cc
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, "Pollard, Mike" <mpollard(at)cincom(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Surrogate keys (Was: enums)
Date: 2006-01-19 20:11:04
Message-ID: 20060119201104.GA7997@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 19, 2006 at 01:48:18PM -0600, Jim C. Nasby wrote:
> On Thu, Jan 19, 2006 at 02:01:14PM -0500, mark(at)mark(dot)mielke(dot)cc wrote:
> > On Thu, Jan 19, 2006 at 10:11:51AM -0800, Josh Berkus wrote:
> > > > So ISTM it's much easier to just use surrogate keys and be
> > > >done with it. Only deviate when you have a good reason to do so.
> > > "The lazy man's guide to SQL database design", but Jim Nasby.
> > > ;-)
> > Hehe... I was thinking the same thing. I've definately seen cases
> > where the use of surrogate keys verges on ridiculous. It hasn't
> > harmed the application, except it terms of complexity. It still works.
> > It still performs fine. The SQL queries are awful looking. :-)
> Got an example?

Sure, but I have to be vague, because it's a company DB. :-)

The DB has a primary table, that maps the primary key to a surrogate
key. The surrogate key is used in several 1:1 and 1:N relationships.
Pretty straight forward. (I tend to put the primary key in the most
primary table that would have a 1:1 relationship, of which there is
one in this database - but whatever) The primary key is an identifier
used for all inputs and outputs to the application. It is used by
manual and automatic processes internal and external to the company.
Definately a primary key / surrogate key scenario.

The problem here, is that the primary key *is* a natural key. It
is generated to be unique, and it is immutable. There are no interfaces
provided to allow the rename of the key. It is a short character
string of 5 to 20 characters.

All queries to the table are joined with this primary key/surrogate
key table, to allow lookup by the primary key, for records only
identified by the surrogate key.

The database is only likely to have a few thousands records, with
the 1:N relationships not exceeding 5 or 10, and not recursive.
For performance, or disk space, it doesn't really matter which way
they went.

The confusion, though, of joining using a surrogate, that is
intended to be opaque (the value is never queried), ensures that
the program has no simple queries. All queries involve at least
one join.

I said almost ridiculous. It's not enough for me to complain, and
request a re-design. I don't really care what it does, as long as
it accepts my data, and allows me to query my data. But, it does
seem silly to me.

> > That's where I would tend to draw the line. For me, I find
> > implementation and maintenance to be the most expensive part of my
> > applications. My data hasn't yet become large enough to make disk
> > space, compute resources, or I/O bandwidth a serious concern.
> Which is exactly what my thought process is. If you mix surrogate and
> non-surrogate keys, how do you know which table has which? Sure, while
> you're actively writing the code it's not an issue, but what about 6
> months later? What about if someone else picks up the code?

It's usually pretty obvious, looking at a database diagram. You look
up the primary key, and see that it only shows up in one table. :-)

> I know Josh was poking fun with his comment about me being lazy, but
> lazy can make for better code. I can go back to code I wrote 3 years ago
> and I know that 99% of tables will have something_id (where something is
> almost certain to be the table name) as a surrogate key to join on;
> there's no need for me to go and figure out what does and what doesn't
> have a surrogate key. The 1% that don't fall into that generally aren't
> an issue because they're normally very large tables that nothing joins
> to.

I don't disagree with you. I just don't mind deciding to use a surrogate
key if I'm unsure, and not using a surrogate if it seems more effort than
gain.

> There's actually an article floating around somewhere about how lazy
> coders are good coders... :)

Dunno where it started, but that's one of the tenets of the developers
of Perl. Of course, with Perl 6, they admitted to having made quite a
few deisgn errors with Perl 5 and earlier... :-)

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paesold 2006-01-19 20:14:23 Re: 8.0.5 Bug in unique indexes?
Previous Message Jim C. Nasby 2006-01-19 20:06:07 Re: No heap lookups on index