Re: Strategies/Best Practises Handling Large Tables

From: Chitra Creta <chitracreta(at)gmail(dot)com>
To: Chris Travers <chris(dot)travers(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Strategies/Best Practises Handling Large Tables
Date: 2012-11-15 11:34:51
Message-ID: CABkVLeM8sYBuFOm6s94Pr1S8uwdTCkkJ+CqBG3s+ZfHkk2drLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks for your example Chris. I will look into it as a long-term solution.

Partitioning tables as a strategy worked very well indeed. This will be my
short/medium term solution.

Another strategy that I would like to evaluate as a short/medium term
solution is archiving old records in a table before purging them.

I am aware that Oracle has a tool that allows records to be exported into a
file / archive table before purging them. They also provide a tool to
import these records.

Does PostgreSQL have similar tools to export to a file and re-import?

If PostgreSQL does not have a tool to do this, does anyone have any ideas
on what file format (e.g. text file containing a table of headers being
column names and rows being records) would be ideal for easy re-importing
into a PostgreSQL table?

Thank you for your ideas.

On Mon, Oct 22, 2012 at 12:14 AM, Chris Travers <chris(dot)travers(at)gmail(dot)com>wrote:

>
>
> On Sun, Oct 21, 2012 at 5:46 AM, Chitra Creta <chitracreta(at)gmail(dot)com>wrote:
>
>> Thank you all.
>>
>> Ryan, would you mind sharing your one-time function to move it?
>>
>> Merlin, what are your suggestions to improve query performance?
>>
>> Shaun, thank you. I will look into facts and dimensions should all else
>> fail.
>>
>> Chris, would you mind giving me an example of what you mean by your log,
>> aggregate and snapshot approach. Also, with indexing, I believe composite
>> and partial indexes are better than indexes, am I correct? Do you have any
>> recommendations as to which type (e.g btree, hash) is better for which
>> situations.
>>
>
> Sure. Suppose I have an accounting system.
>
> I may record the amounts in the transactions in a journal_entry and
> journal_line table. These will be write once read many. However time you
> will end up having to digest millions of records (given sufficient volume)
> to find out the balance of a checking account, and this is not really ideal.
>
> So to deal with this, I might, for example, add a table called
> account_checkpoint which might have the following fields:
>
> account_id
> end_date
> debits
> credits
> balance
>
> And then I can snapshot on closing of books the accumulated debits,
> credits, and balance to date. If I need any of these numbers I can just
> grab the appropriate number from account_checkpoint and roll forward from
> end_date. If I have too much volume I can have closings on a monthly level
> of whatever.
>
> The checkpoint table contains sufficient information for me to start a
> report at any point and end it at any other point without having to scan
> interceding checkpointed periods. Additionally if I want to purge old
> data, I can do so without losing current balance information.
>
> So what this approach does, in essence is it gives you a way to purge
> without losing some aggregated information, and a way to skip portions of
> the table for aggregation purposes you can't skip otherwise. The big thing
> is you cannot insert (and if this is in doubt, you need to enforce this
> with a trigger) any records effective before the most recent checkpoint.
>
> Best Wishes,
> Chris Travers
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dann Corbit 2012-11-15 11:55:23 Re: Postgres Index
Previous Message Albe Laurenz 2012-11-15 11:31:54 Re: Postgres Index