bloom filter indexes?

From: "Mason Hale" <masonhale(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: bloom filter indexes?
Date: 2008-06-03 14:43:25
Message-ID: 8bca3aa10806030743h24a18d4dq334cf91f66aa635d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I've been working on partitioning a rather large dataset into multiple
tables. One limitation I've run into the lack of cross-partition-table
unique indexes. In my case I need to guarantee the uniqueness of a
two-column pair across all partitions -- and this value is not used to
partition the tables. The table is partitioned based on a insert date
timestamp.

To check the uniqueness of this value I've added an insert/update
trigger to search for matches in the other partitions. This trigger is
adding significant overhead to inserts and updates.

This sort of 'membership test' where I need only need to know if the
key exists in the table is a perfect match for bloom filter. (see:
http://en.wikipedia.org/wiki/Bloom_filter)

The Bloom filter can give false positives so using it alone won't
provide the uniqueness check I need, but it should greatly speed up
this process.

Searching around for "postgresql bloom filter" I found this message
from 2005 along the same lines:
http://archives.postgresql.org/pgsql-hackers/2005-05/msg01475.php

This thread indicates bloom filters are used in the intarray contrib
module and the tsearch2 (and I assume the built-in 8.3 full-text
search features).

I also found this assignment for CS course at the University of
Toronto, when entails using bloom filters to speed up large joins:
http://queens.db.toronto.edu/~koudas/courses/cscd43/hw2.pdf

So, my question: are there any general-purpose bloom filter
implementations for postgresql? I'm particularly interested
implementations that would be useful for partitioned tables. Is anyone
working on something like this?

thanks,
- Mason

Responses

Browse pgsql-general by date

  From Date Subject
Next Message PJ 2008-06-03 16:14:43 E_PARSE error ?
Previous Message Stephan Szabo 2008-06-03 14:39:59 Re: join ... using ... and - is this expected behaviour?