Quick Links

Re: bloom filter indexes?

From:	"Mason Hale" <masonhale(at)gmail(dot)com>
To:	"Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: bloom filter indexes?
Date:	2008-06-03 18:06:15
Message-ID:	8bca3aa10806031106j67af071bse6397f783fd6991f@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Tue, Jun 3, 2008 at 12:04 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Tue, 2008-06-03 at 09:43 -0500, Mason Hale wrote:
>> I've been working on partitioning a rather large dataset into multiple
>> tables. One limitation I've run into the lack of cross-partition-table
>> unique indexes. In my case I need to guarantee the uniqueness of a
>> two-column pair across all partitions -- and this value is not used to
>> partition the tables. The table is partitioned based on a insert date
>> timestamp.
>
> You're looking for a constraint across tables.
>

Yes, for this particular case. But I'm also interested in speeding up
cross-partition queries whether it is for a uniqueness check or not.
This uniqueness check is just one (important) instance of a
cross-partition query.

>> To check the uniqueness of this value I've added an insert/update
>> trigger to search for matches in the other partitions. This trigger is
>> adding significant overhead to inserts and updates.
>
> Do you lock all of the tables before doing the check? If not, then you
> have a race condition.
>

Yes, I was concerned about that.

> It's possible this index strategy will be better for your case.
> However, I think what you really want is some kind of multi-table
> primary key. Have you considered storing the key in its own two-column
> table with a UNIQUE index and having the partitions reference it?

Thanks for the suggestion -- I'll explore maintaining the compound key
in its own non-partitioned table. I was trying to avoid any
application-layer code changes. I guess I can still accomplish that by
updating this table via an insert/update trigger.

But to reiterate, having bloom filter-based index would allow constant
time determination of whether a given partition *may* contain the
data. This would be very useful for large partitioned data-sets,
especially in (very common) cases where performance is critical.

This feature would also be useful for applications where data is
partitioned (aka 'federated') across multiple servers.

In response to

Re: bloom filter indexes? at 2008-06-03 17:04:18 from Jeff Davis

Responses

Re: bloom filter indexes? at 2008-06-03 18:12:25 from Joshua D. Drake

Browse pgsql-general by date

	From	Date	Subject
Next Message	Joshua D. Drake	2008-06-03 18:12:25	Re: bloom filter indexes?
Previous Message	Jeff Davis	2008-06-03 17:04:18	Re: bloom filter indexes?