Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?

From: Erwin Brandstetter <brsaweda(at)gmail(dot)com>
To: Miles Elam <miles(dot)elam(at)productops(dot)com>
Cc: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?
Date: 2019-12-11 18:07:09
Message-ID: CAGHENJ5qN8fNoBnPSvFg9-EjKteZ30TQe24XLH6Req+7sSL4Qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Dec 10, 2019 at 11:34 PM Miles Elam <miles(dot)elam(at)productops(dot)com>
wrote:

> In terms of "wasted computation", MD5, SHA1, and the others always compute
> the full length before they are passed to a UUID, int, or whatever. It's a
> sunk cost. It's also a minor cost considering many hash algorithms are
> performed in CPU hardware now. All that's left is the truncation and cast,
> which you can't avoid easily.
>
>
> Sure, you could reimplement Java's .hashCode() method by iterating through
> the characters and processing the character codes:
>
> s[0]*31^(n - 1) + s[1]*31^(n - 2) + ... + s[n - 1]
>
>
> I don't see how that would beat the CPU-based hashes though unless you
> wrote a C-based extension. Maybe it's better just to embrace the
> user-defined function first and then decide if performance is insufficient
> for your use cases.
>
>
> CREATE EXTENSION IF NOT EXISTS pgcrypto;
>
> CREATE OR REPLACE FUNCTION hash8 (p_data text, p_algo text = 'md5')
> RETURNS int8 AS $$
>
> SELECT ('x' || encode(substring(digest(p_data, p_algo) FROM 1 FOR 16),
> 'hex'))::bit(64)::int8
>
> $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;
>
>
> CREATE OR REPLACE FUNCTION hash4 (p_data text, p_algo text = 'md5')
> RETURNS int4 AS $$
>
> SELECT ('x' || encode(substring(digest(p_data, p_algo) FROM 1 FOR 8),
> 'hex'))::bit(32)::int4
>
> $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;
>
>
> SELECT
>
> hash4('something something something'),
>
> hash4('something something something', 'sha1'),
>
> hash8('something something something'),
>
> hash8('something something something', 'sha1');
>
>
>
> Cheers,
>
>
> Miles
>

Thanks for the custom functions! May be useful as fallback.

But I am really looking for standard functions in Postgres first. Those
should be faster and more reliable than writing my own.

Regards

Erwin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Rich Shepard 2019-12-11 19:16:05 Encoding/collation question
Previous Message Erwin Brandstetter 2019-12-11 18:01:43 Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?