Re: Strategies/Best Practises Handling Large Tables

From: Chris Travers <chris(dot)travers(at)gmail(dot)com>
To: Chitra Creta <chitracreta(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Strategies/Best Practises Handling Large Tables
Date: 2012-10-21 13:14:10
Message-ID: CAKt_ZfvRw-98s3MJYxrvj5rmA7jkB4cWKC4byJT2egeuZtnM=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, Oct 21, 2012 at 5:46 AM, Chitra Creta <chitracreta(at)gmail(dot)com> wrote:

> Thank you all.
>
> Ryan, would you mind sharing your one-time function to move it?
>
> Merlin, what are your suggestions to improve query performance?
>
> Shaun, thank you. I will look into facts and dimensions should all else
> fail.
>
> Chris, would you mind giving me an example of what you mean by your log,
> aggregate and snapshot approach. Also, with indexing, I believe composite
> and partial indexes are better than indexes, am I correct? Do you have any
> recommendations as to which type (e.g btree, hash) is better for which
> situations.
>

Sure. Suppose I have an accounting system.

I may record the amounts in the transactions in a journal_entry and
journal_line table. These will be write once read many. However time you
will end up having to digest millions of records (given sufficient volume)
to find out the balance of a checking account, and this is not really ideal.

So to deal with this, I might, for example, add a table called
account_checkpoint which might have the following fields:

account_id
end_date
debits
credits
balance

And then I can snapshot on closing of books the accumulated debits,
credits, and balance to date. If I need any of these numbers I can just
grab the appropriate number from account_checkpoint and roll forward from
end_date. If I have too much volume I can have closings on a monthly level
of whatever.

The checkpoint table contains sufficient information for me to start a
report at any point and end it at any other point without having to scan
interceding checkpointed periods. Additionally if I want to purge old
data, I can do so without losing current balance information.

So what this approach does, in essence is it gives you a way to purge
without losing some aggregated information, and a way to skip portions of
the table for aggregation purposes you can't skip otherwise. The big thing
is you cannot insert (and if this is in doubt, you need to enforce this
with a trigger) any records effective before the most recent checkpoint.

Best Wishes,
Chris Travers

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Ribe 2012-10-21 16:51:40 variadic function, query "in", help with syntax/function
Previous Message Chitra Creta 2012-10-21 12:46:55 Re: Strategies/Best Practises Handling Large Tables