Quick Links

Re: Auto-clustering?

From:	Royce Ausburn <royce(dot)ml(at)inomial(dot)com>
To:	Pierre C <lists(at)peufeu(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Auto-clustering?
Date:	2010-12-19 03:07:48
Message-ID:	7D858BE2-BA54-4917-B9F1-2E0A7B49C8F8@inomial.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On 17/12/2010, at 9:20 PM, Pierre C wrote:

>
>> fc=# explain analyse select collection, period, tariff, sum(bytesSent), sum(bytesReceived), sum(packets), max(sample), (starttime / 3600) * 3600 as startchunk from sample_20101001 where starttime between 1287493200 and 1290171599 and collection=128 and ip = '10.9.125.207' group by startchunk, tariff, collection, period;
>
> If CLUSTER locks bother you, and you don't do UPDATEs, you might consider doing something like this :
>
> - accumulate the rows in a "recent" table
> - every hour, INSERT INTO archive SELECT * FROM recent ORDER BY (your cluster fields)
> - DELETE FROM recent the rows you just inserted
> - VACUUM recent
>
> The cluster in your archive table will not be perfect but at least all rows from 1 source in 1 hour will be stored close together. But clustering doesn't need to be perfect either, if you get 100x better locality, that's already good !

That's a really decent idea and can slot in perfectly well with how the application already works! We have existing DBAO code that handles monthly tables; it'll happily pop data in to a recent table.... In fact we can probably tolerate having a "today" table. Thanks!

--Royce

In response to

Re: Auto-clustering? at 2010-12-17 10:20:01 from Pierre C

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jim Nasby	2010-12-19 18:22:01	Re: Strange optimization - xmin,xmax compression :)
Previous Message	Royce Ausburn	2010-12-18 22:39:38	Re: Auto-clustering?