From: | Erwin Brandstetter <brsaweda(at)gmail(dot)com> |
---|---|
To: | Miles Elam <miles(dot)elam(at)productops(dot)com> |
Cc: | PG-General Mailing List <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Fast, stable, portable hash function producing 4-byte or 8-byte values? |
Date: | 2019-12-11 18:07:09 |
Message-ID: | CAGHENJ5qN8fNoBnPSvFg9-EjKteZ30TQe24XLH6Req+7sSL4Qg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, Dec 10, 2019 at 11:34 PM Miles Elam <miles(dot)elam(at)productops(dot)com>
wrote:
> In terms of "wasted computation", MD5, SHA1, and the others always compute
> the full length before they are passed to a UUID, int, or whatever. It's a
> sunk cost. It's also a minor cost considering many hash algorithms are
> performed in CPU hardware now. All that's left is the truncation and cast,
> which you can't avoid easily.
>
>
> Sure, you could reimplement Java's .hashCode() method by iterating through
> the characters and processing the character codes:
>
> s[0]*31^(n - 1) + s[1]*31^(n - 2) + ... + s[n - 1]
>
>
> I don't see how that would beat the CPU-based hashes though unless you
> wrote a C-based extension. Maybe it's better just to embrace the
> user-defined function first and then decide if performance is insufficient
> for your use cases.
>
>
> CREATE EXTENSION IF NOT EXISTS pgcrypto;
>
> CREATE OR REPLACE FUNCTION hash8 (p_data text, p_algo text = 'md5')
> RETURNS int8 AS $$
>
> SELECT ('x' || encode(substring(digest(p_data, p_algo) FROM 1 FOR 16),
> 'hex'))::bit(64)::int8
>
> $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;
>
>
> CREATE OR REPLACE FUNCTION hash4 (p_data text, p_algo text = 'md5')
> RETURNS int4 AS $$
>
> SELECT ('x' || encode(substring(digest(p_data, p_algo) FROM 1 FOR 8),
> 'hex'))::bit(32)::int4
>
> $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;
>
>
> SELECT
>
> hash4('something something something'),
>
> hash4('something something something', 'sha1'),
>
> hash8('something something something'),
>
> hash8('something something something', 'sha1');
>
>
>
> Cheers,
>
>
> Miles
>
Thanks for the custom functions! May be useful as fallback.
But I am really looking for standard functions in Postgres first. Those
should be faster and more reliable than writing my own.
Regards
Erwin
From | Date | Subject | |
---|---|---|---|
Next Message | Rich Shepard | 2019-12-11 19:16:05 | Encoding/collation question |
Previous Message | Erwin Brandstetter | 2019-12-11 18:01:43 | Re: Fast, stable, portable hash function producing 4-byte or 8-byte values? |