From: | "Dawid Kuroczko" <qnex42(at)gmail(dot)com> |
---|---|
To: | "Postgres Hackers" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Table clustering idea |
Date: | 2006-06-25 23:48:39 |
Message-ID: | 758d5e7f0606251648h4d518ca6k7e1c511ba316bb8b@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
There is a well known command called CLUSTER which organizes table
in specified index's order. It has a drawback, that new tuples added are
not in this order. Last night I had idea which could be interesting, I hope.
The idea is to make use of 'histogram_bounds' collected statistical data.
Instead of inserting row into first suitable spot in a table, a table would
be "divided" into sections, one for each of histogram_bounds ranges.
When inserting, the database would try to find most suitable section
to insert (using the histogram_bounds), and if there were free spots
there, would insert there. If not, it would either look for a tuple in
nearby
sections, or first suitable place.
What would it do? It would try to keep table somewhat organized,
keeping rows of similar values close together (within SET STATISTICS
resolution, so a common scenario would be 50 or 100 "sections").
It would make it a bit hard for a table to shrink (since new rows would
be added throughout the table, not at the beginning).
Other idea than using histogram_bounds would be using the position
of key inside the index to determine the "ideal" place of row inside
the table and find the closest free spot there. This would be of course
much more precise and wouldn't rely on statistic.
Regards,
Dawid
From | Date | Subject | |
---|---|---|---|
Next Message | Luke Lonergan | 2006-06-26 00:04:18 | Re: Table clustering idea |
Previous Message | Diogo Biazus | 2006-06-25 23:19:12 | Re: xlog viewer proposal |