Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?

From: Erik Aronesty <erik(at)q32(dot)com>
To: Erwin Brandstetter <brsaweda(at)gmail(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-general(at)postgresql(dot)org
Subject: Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?
Date: 2019-12-15 22:17:15
Message-ID: CAJowKg+BbnR4Wrhse__9RzJaaaYqCUHSjA9eqK3XR7ECub_SRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

You can always tweak fnv for whatever bite-size or bit size you want.
Sometimes I know a little information about my data shape and make a
custom fnv that only looks at the first half for the last half of a string,
etc.

On Wed, Dec 11, 2019, 1:02 PM Erwin Brandstetter <brsaweda(at)gmail(dot)com> wrote:

> Thanks for the suggestion. Seems like a good assumption and I have been
> using hashtext() in the past. But I am uncertain whether it is the best
> option.
>
> Guess Tom's warning in
> https://www.postgresql.org/message-id/9434.1568839177@sss.pgh.pa.us about
> portability only refers to hashtextextended() and friends not being there
> in Postgres 10 or older.
>
> But why are none of these functions documented? Does the project still not
> ...
>
> > want people to rely on them continuing to have exactly the current
> behavior.
>
> I am not complaining, maybe just nobody did the work. But it's also
> mentioned in this old thread, that hastext() changed in the past. Is all of
> that outdated and we are welcome to use those functions for indexing?
>
> https://www.postgresql.org/message-id/flat/24463.1329854466%40sss.pgh.pa.us#c18e62281dc78f6d64c1a4d41ab8569b
> <https://www.postgresql.org/message-id/24463.1329854466@sss.pgh.pa.us>
>
> Filtering with amprocnum = 2 gets functions producing bigint in Postgres
> 11 or later. Not sure about the exact meaning of amprocnum, manual says
> "Support function number".
>
> Remaining problem either way: no hash function returning bigint for
> Postgres 10.
>
> Regards
> Erwin
>
> On Tue, Dec 10, 2019 at 11:13 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
> wrote:
>
>> On Tue, 2019-12-10 at 22:11 +0100, Erwin Brandstetter wrote:
>> > I am looking for stable hash functions producing 8-byte or 4-byte
>> hashes from long text values in Postgres 10 or later.
>> >
>> > [...]
>> >
>> > There is an old post from 2012 by Tom Lane suggesting that hashtext()
>> and friends are not for users:
>> >
>> > https://www.postgresql.org/message-id/24463.1329854466%40sss.pgh.pa.us
>>
>> Changing a hash function would corrupt hash indexes, wouldn't it?
>>
>> So I'd expect these functions to be pretty stable:
>>
>> SELECT amp.amproc
>> FROM pg_amproc AS amp
>> JOIN pg_opfamily AS opf ON amp.amprocfamily = opf.oid
>> JOIN pg_am ON opf.opfmethod = pg_am.oid
>> WHERE pg_am.amname = 'hash'
>> AND amp.amprocnum = 1;
>>
>> Or at least there would have to be a fat warning in the release notes
>> to reindex hash indexes.
>>
>> Yours,
>> Laurenz Albe
>> --
>> Cybertec | https://www.cybertec-postgresql.com
>>
>>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ron 2019-12-16 02:21:44 Re: Is there any tool that provides Physical Backup plus PITR for a single database ( Not the whole PG instance ) ?
Previous Message George Neuner 2019-12-15 21:59:30 Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?