Re: Time-correlated columns in large tables

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: "Jeroen T(dot) Vermeulen" <jtv(at)xs4all(dot)nl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Time-correlated columns in large tables
Date: 2007-03-05 20:17:34
Message-ID: 45EC7ADE.20206@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeroen T. Vermeulen wrote:
> [Q: Is there some other transparent optimization for values that correlate
> with insertion/update order?]
>
> So I was wondering whether it would make sense to have a more compact kind
> of index. One that partitions the value range of a given column into
> sub-ranges, and for each of those, merely keeps loose track of where those
> value ranges might be found. "Dates from 2006-07-15 to 2006-08-04: try
> blocks 99-126 and 175." Just a fixed maximum of two or three contiguous
> block ranges per value range would probably do the trick. The risk of
> having too few, of course, is that one oddly positioned block could make
> the index look as if a particular value range was spread out throughout
> most of the table.

I think you've just described a range-encoded bitmap index. The idea is
to divide the range of valid values into a some smallish number of
subranges, and for each of these boundary values you store a bitmap
where you set the bit representing every tuple with value < boundary value.

For example, imagine a simple table like this:

position | key
---------+-----
1 | 1
2 | 2
3 | 3
...
10 | 10

If you divide this into for example 4 subranges, you'd get bitmaps

key bitmap
-----+-----------
<= 3 | 1110000000
<= 6 | 1111110000
<= 8 | 1111111100
<= 10| 1111111111

Now to find all tuples <= 3, you'd fetch all tuples in the first bitmap.
To find all tuples <= 2, you'd fetch all tuples in the first bitmap, and
recheck the <= 2 condition. To find anything between say 4 and 8, you'd
take the XOR of the 3rd and 1st bitmap, and recheck the > 4 condition on
resulting tuples.

So basically this allows you to satisfy any range query by reading two
bitmaps and XORing them together. The resulting tuples will generally
need to be rechecked because the index is lossy.

I *really* hope the equality-encoded bitmap index gets into 8.3. I've
been trying to keep range-encoded bitmaps in mind when looking at the
proposed indexam API changes. Hopefully we'll see them in 8.4. Feel free
to submit a patch ;-).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-03-05 20:18:47 Re: Bug: Buffer cache is not scan resistant
Previous Message Jeff Davis 2007-03-05 20:12:34 Re: Bug: Buffer cache is not scan resistant