Re: timestamp as primary key?

From: Richard Huxton <dev(at)archonet(dot)com>
To: Joe Kramer <cckramer(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: timestamp as primary key?
Date: 2006-10-19 14:20:26
Message-ID: 453789AA.9040807@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Joe Kramer wrote:
> Hello,
>
> I have table for online chat system that keep messages sent between users.
>
> CREATE TABLE chat_message
> (
> message_time timestamp without time zone NOT NULL DEFAULT now(),

Hmm - timestamp without time zone. So you don't actually care when the
message was sent? Or you know all your users will be in one timezone and
don't care about calculating elapsed times, perhaps. You might want to
re-read the date+time part of the manuals again - the difference between
with/without time zone is subtle but important.

> message_body text,
> user_id_from bigint,

An 8-byte integer for user_id? And you're worried about 4 bytes for a
pkey. Well, if you've going to have billions of registered users then I
can see why you'll want to save that four bytes per row.

> user_id_to bigint,

I'm assuming you've left out fkey references to a user table.

> CONSTRAINT chat_message_pkey PRIMARY KEY (message_time)
> )
> WITHOUT OIDS;

Without OIDs is a good idea - especially if you're going to have
billions of users sending messages.

> I don't want to add int primary key because I don't ever need to find
> messages by unique id.

So why the meaningless constraint? You're not supposed to have a primary
key to keep the RDBMS police from kicking down your door and dragging
you away, you're supposed to have it one so you can distinguish
individual rows. If you genuinely don't want to be able to locate an
individual message then don't have a primary key at all. Don't lie to
yourself by choosing columns that aren't unique.

If you think you don't need a primary key at all, think about how you
would recover from the following:
1. Insert a batch of messages from archive.
2. Accidentally insert the same batch again.
3. Delete archive.
4. Notice duplicates.
Best of luck.

> Question: is it okay to use timestamp as primary key, or there is
> possibility of collision? (Meaning two processes may INSERT into table
> within same millisecond.) It is a web application.

You tell us. Is it possible that a database server could allocate the
same now() time for two rows given current hardware (or future hardware
for that matter). What happens if you do multiple inserts per
transaction? What about with multiple processors? What if the system
clock gets reset? What if you end up having two servers and need to
merge their message stores? Does this start to sound like a lot of
uncertain, uncontrollable things to be sure about? So long as you can
*guarantee* that it's impossible to duplicate the time you're fine.

I'd take a step back and check you're clear on all the use-cases for
this message store. I'm guessing you'll decide you *do* need to be able
to distinguish between different messages, that there is no pre-existing
primary key and that you'll want an auto-incremented integer primary-key.

--
Richard Huxton
Archonet Ltd

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2006-10-19 14:24:13 Re: Overload after some minutes, please help!
Previous Message Michael Fuhr 2006-10-19 14:15:07 Re: problems installing pg on solaris