Table with active and historical data

From: Robert James <srobertjames(at)gmail(dot)com>
To: Postgres General <pgsql-general(at)postgresql(dot)org>
Cc: srobertjames(at)gmail(dot)com
Subject: Table with active and historical data
Date: 2011-06-01 23:30:11
Message-ID: BANLkTimcy1BhLgzbR1pc4aMmgOt=90ch+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have a table with a little active data and a lot of historical data.
I'd like to be able to access the active data very quickly - quicker
than an index. Here are the details:

1. Table has about 1 million records
2. Has a column active_date - on a given date, only about 1% are
active. active_date is indexed and clustered on.
3. Many of my queries are WHERE active_date = today. Postgres uses
the index for these, but still lakes quite a lot of time. I repeat
these queries regularly.
4. I'd like to somehow presort or partition the data so that Postgres
doesn't have to do an index scan each time. I'm not sure how to do
this? Idea? I know it can be done with inheritance and triggers (
http://stackoverflow.com/questions/994882/what-is-a-good-way-to-horizontal-shard-in-postgresql
), but that method looks a little too complex for me. I'm looking for
something simple.
5. Finally, I should point out that I still do a large number of
queries on historical data as well.

What do you recommend? Ideas? Also: Why doesn't cluster on active_date
solve the problem? Specifically, if I run SELECT * FROM full_table
WHERE active_date = today, I get a cost of 3500. If I first select
those records into a new table, and then do SELECT * on the new table,
I get a cost of 64. Why is that? Why doesn't clustering pregroup
them?

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2011-06-02 00:50:30 Re: Mixed up protocol packets in server response?
Previous Message Prafulla Tekawade 2011-06-01 22:08:05 Column aliases in having or where for Postgres 8.3