From: | "Mason Hale" <masonhale(at)gmail(dot)com> |
---|---|
To: | "Jeff Davis" <pgsql(at)j-davis(dot)com> |
Cc: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: bloom filter indexes? |
Date: | 2008-06-03 18:06:15 |
Message-ID: | 8bca3aa10806031106j67af071bse6397f783fd6991f@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, Jun 3, 2008 at 12:04 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Tue, 2008-06-03 at 09:43 -0500, Mason Hale wrote:
>> I've been working on partitioning a rather large dataset into multiple
>> tables. One limitation I've run into the lack of cross-partition-table
>> unique indexes. In my case I need to guarantee the uniqueness of a
>> two-column pair across all partitions -- and this value is not used to
>> partition the tables. The table is partitioned based on a insert date
>> timestamp.
>
> You're looking for a constraint across tables.
>
Yes, for this particular case. But I'm also interested in speeding up
cross-partition queries whether it is for a uniqueness check or not.
This uniqueness check is just one (important) instance of a
cross-partition query.
>> To check the uniqueness of this value I've added an insert/update
>> trigger to search for matches in the other partitions. This trigger is
>> adding significant overhead to inserts and updates.
>
> Do you lock all of the tables before doing the check? If not, then you
> have a race condition.
>
Yes, I was concerned about that.
> It's possible this index strategy will be better for your case.
> However, I think what you really want is some kind of multi-table
> primary key. Have you considered storing the key in its own two-column
> table with a UNIQUE index and having the partitions reference it?
Thanks for the suggestion -- I'll explore maintaining the compound key
in its own non-partitioned table. I was trying to avoid any
application-layer code changes. I guess I can still accomplish that by
updating this table via an insert/update trigger.
But to reiterate, having bloom filter-based index would allow constant
time determination of whether a given partition *may* contain the
data. This would be very useful for large partitioned data-sets,
especially in (very common) cases where performance is critical.
This feature would also be useful for applications where data is
partitioned (aka 'federated') across multiple servers.
From | Date | Subject | |
---|---|---|---|
Next Message | Joshua D. Drake | 2008-06-03 18:12:25 | Re: bloom filter indexes? |
Previous Message | Jeff Davis | 2008-06-03 17:04:18 | Re: bloom filter indexes? |