Re: newbie table design question

From: Andrew Smith <laconical(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: newbie table design question
Date: 2009-06-01 06:41:08
Message-ID: 300497a30905312341l4ac7bd5cxd4b10849c305d869@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Jun 1, 2009 at 1:25 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Andrew Smith <laconical(at)gmail(dot)com> writes:
> > I'm a beginner when it comes to Postgresql, and have a table design
> question
> > about a project I'm currently working on. I have 1500 data items that
> need
> > to be copied every minute from an external system into my database. The
> > items have a timestamp, an identifier and a value. For example:
>
> > 12/10/2008 05:00 ID_ABC 14
> > 12/10/2008 05:01 ID_ABC 17
> > 12/10/2008 05:02 ID_ABC 13
>
> > Pretty simple stuff. The current solution (implemented using SQL Server
> a
> > few years ago) looks like this (an approximation using Postgresql
> syntax):
>
> > CREATE TABLE "DataImport"
> > (
> > "DataImportID" serial NOT NULL PRIMARY KEY,
> > "Time" timestamp without time zone NOT NULL,
> > "ID_ABC" integer NOT NULL,
> > "ID_DEF" integer NOT NULL,
> > "ID_HIJ" integer NOT NULL,
> > etc
> > );
>
> So the table would have ~1500 columns? You definitely don't want to do
> it that way in Postgres --- you'd be way too close to the maximum column
> count limitation.
>

After my original post, I found out that the current solution in SQL Server
actually had to be split into two different tables due to a similar
limitation.

>
>
> > My initial thought for the design of the new solution looks like this:
>
> > CREATE TABLE "DataImport"
> > (
> > "DataImportID" serial NOT NULL PRIMARY KEY,
> > "Time" timestamp without time zone NOT NULL,
> > "Identifier" text NOT NULL,
> > "Value" integer NOT NULL
> > );
>
> DataImportID couldn't be a primary key here, could it? Or is it just
> meant as an artificial primary key? If so, consider not bothering with
> it at all --- (Time, Identifier) seems like a perfectly good natural
> key, and with such short records the extra space for a serial column
> is not negligible.
>

You're right, I guess there doesn't seem to be much point having that
surrogate key in place...and it does take up space. I did a quick test and
got the following figures:

1 million records with DataImportID = ~80mb
1 million records without DataImportID = ~50mb.

That'll certainly add up over time.

> Anyway, my answer would definitely lean towards using this normalized
> representation, if all the data values are integers. (If they're not,
> it gets messier...)
>
> > Users will then be doing regular queries on this data (say, a few hundred
> > times per day), such as:
>
> > SELECT "Time", "Value" FROM "DataImport" WHERE "Identifier" = 'ID_ABC'
> AND
> > "Time" between '2008-11-07' and '2008-11-11';
>
> An index on (Identifier, Time) (not the other way around) should work
> well for that type of query.

I'll give it a try.

>
>
> > My concern is that 1500 values * 14400 minutes per day = 21,600,000
> > records. Add this up over the course of a month (the length of time I
> need
> > to keep the data in this table) and I'll have more than half a billion
> > records being stored in there.
>
> That's not an especially big table. However it might be worth your
> trouble to use partitioning. Not to speed searches, particularly, but
> to make it easier to drop 1/30th of the data each day.

Re-reading this I noticed that I had an extra 0 in one of my figures -
there's only 1440 minutes in a day, so my table gets down to the much more
manageable 2.16 million records per day instead of 21.6 million. I'll have
a look into partitioning too, thanks for your help!

Cheers,

Andrew

In response to

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2009-06-01 07:05:59 Re: pg_dump & table space
Previous Message Gurjeet Singh 2009-06-01 06:36:36 Query to find Foreign Key column data type mismatch