From: | PFC <lists(at)peufeu(dot)com> |
---|---|
To: | "Jim C(dot) Nasby" <decibel(at)decibel(dot)org> |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Feature suggestion : FAST CLUSTER |
Date: | 2007-05-27 17:34:30 |
Message-ID: | op.tszxrsircigqcu@apollo13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Sun, 27 May 2007 17:53:38 +0200, Jim C. Nasby <decibel(at)decibel(dot)org>
wrote:
> On Tue, May 22, 2007 at 09:29:00AM +0200, PFC wrote:
>> This does not run a complete sort on the table. It would be about as
>> fast as your seq scan disk throughput. Obviously, the end result is
>> not as
>> good as a real CLUSTER since the table will be made up of several
>> ordered
>> chunks and a range lookup. Therefore, a range lookup on the clustered
>> columns would need at most N seeks, versus 1 for a really clustered
>> table.
>> But it only scans the table once and writes it once, even counting index
>> rebuild.
>
> Do you have any data that indicates such an arrangement would be
> substantially better than less-clustered data?
While the little benchmark that will answer your question is running,
I'll add a few comments :
I have been creating a new benchmark for PostgreSQL and MySQL, that I
will call the Forum Benchmark. It mimics the activity of a forum.
So far, I have got interesting results about Postgres and InnoDB and will
publish an extensive report with lots of nasty stuff in it, in, say, 2
weeks, since I'm doing this in spare time.
Anyway, forums like clustered tables, specifically clusteriing posts on
(topic_id, post_id), in order to be able to display a page with one disk
seek, instead of one seek per post.
PostgreSQL humiliates InnoDB on CPU-bound workloads (about 2x faster
since I run it on dual core ; InnoDB uses only one core). However, InnoDB
can automatically cluster tables without maintenance. This means InnoDB
will, even though it sucks and is awfully bloated, run a lot faster than
postgres if things become IO-bound, ie. if the dataset is larger than RAM.
Postgres needs to cluster the posts table in order to keep going. CLUSTER
is very slow. I tried inserting into a new posts table, ordering by
(post_id, topic_id), then renaming the new table in place of the old. It
is faster, but still slow when handling lots of data.
I am trying other approaches, some quite hack-ish, and will report my
findings.
Regards
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Staubo | 2007-05-27 18:27:31 | Re: Feature suggestion : FAST CLUSTER |
Previous Message | Stefan Kaltenbrunner | 2007-05-27 16:34:22 | Re: Domains versus Check Constraints |