Re: Question regarding fast-hashing in PGSQL

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Conley <cheetah(at)tanabi(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Question regarding fast-hashing in PGSQL
Date: 2019-09-18 20:39:37
Message-ID: 9434.1568839177@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Stephen Conley <cheetah(at)tanabi(dot)org> writes:
> My idea was to hash the string to a bigint, because the likelihood of all 3
> columns colliding is almost 0, and if a duplicate does crop up, it isn't
> the end of the world.

> However, Postgresql doesn't seem to have any 'native' hashing calls that
> result in a bigint.

regression=# \df hashtext*
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+------------------+------------------+---------------------+------
pg_catalog | hashtext | integer | text | func
pg_catalog | hashtextextended | bigint | text, bigint | func
(2 rows)

The "extended" hash API has only been there since v11, so you
couldn't rely on it if you need portability to old servers.
But otherwise it seems to respond precisely to your question.

If you do need portability ... does the text string's part of the
hash *really* have to be 64 bits wide? Why not just concatenate
it with a 32-bit hash of the other fields?

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Lewis 2019-09-18 20:56:22 Re: pg12 - migrate tables to partitions structure
Previous Message Mariel Cherkassky 2019-09-18 20:13:14 Re: pg12 - migrate tables to partitions structure