Table clustering idea

From: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>
To: "Postgres Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Table clustering idea
Date: 2006-06-25 23:48:39
Message-ID: 758d5e7f0606251648h4d518ca6k7e1c511ba316bb8b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

There is a well known command called CLUSTER which organizes table
in specified index's order. It has a drawback, that new tuples added are
not in this order. Last night I had idea which could be interesting, I hope.

The idea is to make use of 'histogram_bounds' collected statistical data.
Instead of inserting row into first suitable spot in a table, a table would
be "divided" into sections, one for each of histogram_bounds ranges.
When inserting, the database would try to find most suitable section
to insert (using the histogram_bounds), and if there were free spots
there, would insert there. If not, it would either look for a tuple in
nearby
sections, or first suitable place.

What would it do? It would try to keep table somewhat organized,
keeping rows of similar values close together (within SET STATISTICS
resolution, so a common scenario would be 50 or 100 "sections").
It would make it a bit hard for a table to shrink (since new rows would
be added throughout the table, not at the beginning).

Other idea than using histogram_bounds would be using the position
of key inside the index to determine the "ideal" place of row inside
the table and find the closest free spot there. This would be of course
much more precise and wouldn't rely on statistic.

Regards,
Dawid

Browse pgsql-hackers by date

  From Date Subject
Next Message Luke Lonergan 2006-06-26 00:04:18 Re: Table clustering idea
Previous Message Diogo Biazus 2006-06-25 23:19:12 Re: xlog viewer proposal