| From: | Decibel! <decibel(at)decibel(dot)org> |
|---|---|
| To: | Bruce Momjian <bruce(at)momjian(dot)us> |
| Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Index AM change proposals, redux |
| Date: | 2008-04-24 16:24:29 |
| Message-ID: | 39CFF9DB-E0B5-4B69-B975-94FA120D5EEA@decibel.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Apr 24, 2008, at 10:43 AM, Bruce Momjian wrote:
Bruce asked if these should be TODOs...
>> Index compression is possible in many ways, depending upon the
>> situation. All of the following sound similar at a high level, but
>> each
>> covers a different use case.
>>
>> * For Long, Similar data e.g. Text we can use Prefix Compression
>> We still store one pointer per row, but we reduce the size of the
>> index
>> by reducing the size of the key values. This requires us to reach
>> inside
>> datatypes, so isn't a very general solution but is probably an
>> important
>> one in the future for Text.
I think what would be even more useful is doing this within the table
itself, and then bubbling that up to the index.
>> * For Unique/nearly-Unique indexes we can use Range Compression
>> We reduce the size of the index by holding one index pointer per
>> range
>> of values, thus removing both keys and pointers. It's more efficient
>> than prefix compression and isn't datatype-dependant.
Definitely.
>> * For Highly Non-Unique Data we can use Duplicate Compression
>> The latter is the technique used by Bitmap Indexes. Efficient, but
>> not
>> useful for unique/nearly-unique data
Also definitely. This would be hugely useful for things like "status"
or "type" fields.
>> * Multi-Column Leading Value Compression - if you have a multi-column
>> index, then leading columns are usually duplicated between rows
>> inserted
>> at the same time. Using an on-block dictionary we can remove
>> duplicates.
>> Only useful for multi-column indexes, possibly overlapping/contained
>> subset of the GIT use case.
Also useful, though I generally try and put the most diverse values
first in indexes to increase the odds of them being used. Perhaps if
we had compression this would change.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2008-04-24 16:27:15 | Re: [GENERAL] I think this is a BUG? |
| Previous Message | Simon Riggs | 2008-04-24 16:21:35 | Re: Index AM change proposals, redux |