Re: Streaming large data into postgres [WORM like applications]

From: Alban Hertroys <alban(at)magproductions(dot)nl>
To: "John D(dot) Burger" <john(at)mitre(dot)org>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Streaming large data into postgres [WORM like applications]
Date: 2007-05-14 13:05:59
Message-ID: 46485EB7.80103@magproductions.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

John D. Burger wrote:
> Dhaval Shah wrote:
>
>> 2. Most of the streamed rows are very similar. Think syslog rows,
>> where for most cases only the timestamp changes. Of course, if the
>> data can be compressed, it will result in improved savings in terms of
>> disk size.
>
> If it really is usually just the timestamp that changes, one way to
> "compress" such data might be to split your logical row into two
> tables. First table has all the original columns but the timestanp,
> plus an ID. Second table has the timestamp and a foreign key into the
> first table. Depending on how wide your original row is, and how often
> it's only the timestamp that changes, this could result in decent
> "compression".
>
> Of course, now you need referential integrity.

I thought of something similar. Maybe you could put those timestamps in
an array column; saves you a referential integrity check that you don't
seem to need very much.

OTOH, _if_ your log messages(?) look very similar each time, you may be
able to turn the problem around; you store unique log messages, with the
timestamps that they occured on. That way you rarely need to store more
than a timestamp. It'll add time to look up the matching log message
(there'll be quite a few less of them though). I'm quite confident
you'll save time inserting records this way, although that's hand waving
at this point.

You may be able to parameterize some log messages and store the
parameters with the timestamps. Takes a bit more processing though
(regular expression matches maybe?), and you'll want to now all
different log message permutations beforehand.

Regards,
--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2007-05-14 13:20:30 Re: EXCEPTION clause not identified
Previous Message Ron Johnson 2007-05-14 13:03:03 Re: primary key index