From: | Kenneth Marshall <ktm(at)rice(dot)edu> |
---|---|
To: | Mark Mielke <mark(at)mark(dot)mielke(dot)cc> |
Cc: | Hannu Krosing <hannu(at)skype(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Hash index todo list item |
Date: | 2007-09-06 18:08:59 |
Message-ID: | 20070906180858.GB13405@it.is.rice.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Sep 06, 2007 at 11:53:45AM -0400, Mark Mielke wrote:
> Hannu Krosing wrote:
>>>> One approahc is not to mix hashes, but to partition the hash, so that
>>>> each column gets its N bits in the hash.
>>> How does that help? You still need all the keys to find out which
>>> bucket to look in.
>>>
>>
>> no. you need to look at only the buckets where that part of hash matches
>>
>> say you allocate bits 4-7 for column 2 and then need to look up column 2
>> value with hash 3 . here you need to look at only buckets N*16 + 3, that
>> is, you need to examine only each 16th bucket
>>
>>
>
> I don't like the truncating hash suggestion because it limits the ability
> of a hash code to uniquely identify a key.
>
> If a user requires the ability to search on both (column1) and (column1,
> column2), they can create two hash indexes and the planner can decide which
> to use.
> Or, they can use a btree. I think hash has a subset of uses where it would
> be a significant gain, and focus should be spent on this subset.
>
> Cheers,
> mark
>
I agree that we should focus primarily on the subset of uses for hash
indexes where there would be a significant gain. I do think that being
able to use a single O(1) hash lookup against all the values specified
in a pseudo-multi-column index could be very beneficial in reducing
access time and I/O.
Since we already have to check the actual tuple values for any index
lookup in postgresql, we could only store the full hash value and the
corresponding TIDs in the bucket. Then when we lookup an item by
calculating its hash, if the exact hash is not present in the bucket,
then we know that the item is not in the index. If the value exists,
then we would check the heap tuples before returning the results. Thus
a negative lookup only needs to check the index and if the hash function
is "good" there will be optimally only 1 possibly valid heap tuple if
there is a match. One very big win for this change is to allow a much
smaller index size (hash value + relevant TIDs) and the large column
values are only stored in the actual data tuples.
Regards,
Ken
> --
> Mark Mielke <mark(at)mielke(dot)cc>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Trevor Talbot | 2007-09-06 18:54:41 | Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC) |
Previous Message | Magnus Hagander | 2007-09-06 17:59:58 | Re: [HACKERS] pg_regress config |