Re: Storing and comparing columns of cryptographic hashes?

From: Greg Sabino Mullane <htamfids(at)gmail(dot)com>
To: Josh Triplett <josh(at)joshtriplett(dot)org>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Storing and comparing columns of cryptographic hashes?
Date: 2024-04-08 14:43:51
Message-ID: CAKAnmmLMa-q3F4K42J=yN15VFHyqs5JNbW=tsYQE=ZmdQGVwzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Apr 8, 2024 at 10:08 AM Josh Triplett <josh(at)joshtriplett(dot)org> wrote:

> - Is there a way to tell postgresql "this column contains cryptographic
> hashes, so you can do hash joins using any subset of the bits,
> without having to hash them again"? If not, should there be?
>

No, and no. (if I understand your question correctly). You could use a
functional index, I suppose, but seems premature optimization.

> - Is `bit(256)` the right type to use to store 32-byte hash values with no
> additional overhead?
>

No, you would want bytea. I would store the value in a TEXT field, unless
you really worried about space savings. The hexadecimal value will be far
easier to debug and work with, and you can use a simple b-tree index.

- What would be the simplest way, given an input array of hashes (which
> I may have to pass in as an array and use `unnest`), to filter out all
> the values that already exist, *and* generate a corresponding bitmap
> in the same order for present/not-present for the entire array (to
> send back to the user)? Filtering seems easy enough, but generating
> the bitmap less so.
>

Something like this:

SELECT array_agg(case when t.bhash is null then 1 else 0 end)
from unnest(array['blakehash1', 'blakehash2', etc...]) as a(x)
left join mytable t on t.bhash = a.x;

> - Does it make more sense to store the values as one row per value, or
> as one row per group of values?

Hard to answer without knowing more, but I'd lean towards simple and one
row per value.

Your proposal (query db, do external work, update db) also sets of lots of
concurrency red flags, so be mindful of that.

Cheers,
Greg

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2024-04-08 14:49:49 Re: PEM install error
Previous Message mark bradley 2024-04-08 14:24:25 PEM install error