Re: How to get an md5/sha256 hash of a really large object in psql?

From: Erik Wienhold <ewie(at)ewie(dot)name>
To: Alex Shan <3341018(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: How to get an md5/sha256 hash of a really large object in psql?
Date: 2023-07-29 16:57:44
Message-ID: 2099704620.64179.1690649864815@office.mailbox.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> On 29/07/2023 08:42 CEST Alex Shan <3341018(at)gmail(dot)com> wrote:
>
> In my DB I have a large object over 4GB in size.
> I need to get its MD5 or SHA256 from within psql query, i.e. without
> exporting it to FS first.
>
> “SELECT md5(lo_get(loid));” doesnt work — “large object is too large”.
>
> Is there any other way to do it?

Is plpython3u [1] an option for you? In that case you can use Python's hashlib
in a custom function and feed every page from pg_largeobject to a selected hash
function.

For example:

BEGIN;

CREATE EXTENSION plpython3u;

CREATE OR REPLACE FUNCTION lo_hash(loid oid, name text)
RETURNS bytea
LANGUAGE plpython3u
AS $$
import hashlib

hash = hashlib.new(name)

# Check if large object exists.
plan = plpy.prepare("""
SELECT
FROM pg_largeobject_metadata
WHERE oid = $1
""", ['oid'])
rv = plpy.execute(plan, [loid])

if rv.nrows() == 0:
raise ValueError(f"large object {loid} does not exist")

# Get all pages (possibly zero).
plan = plpy.prepare("""
SELECT data
FROM pg_largeobject
WHERE loid = $1
ORDER BY pageno
""", ['oid'])
pages = plpy.cursor(plan, [loid])

for page in pages:
hash.update(page['data'])

return hash.digest()
$$;

COMMIT;

Testing with 65 KiB null bytes:

BEGIN;

SELECT
lo_from_bytea(0, decode(repeat('00', 1 << 16), 'hex')) AS test_loid
\gset

SELECT
loid,
count(*) AS n_pages,
sum(length(data)) AS n_bytes
FROM pg_largeobject
WHERE loid = :test_loid
GROUP BY loid;

loid | n_pages | n_bytes
--------+---------+---------
365958 | 32 | 65536
(1 row)

SELECT
:test_loid AS loid,
lo_hash(:test_loid, 'md5') AS md5;

loid | md5
--------+------------------------------------
365958 | \xfcd6bcb56c1689fcef28b57c22475bad
(1 row)

SELECT
:test_loid AS loid,
lo_hash(:test_loid, 'sha256') AS sha256;

loid | sha256
--------+--------------------------------------------------------------------
365958 | \xde2f256064a0af797747c2b97505dc0b9f3df0de4f489eac731c23ae9ca9cc31
(1 row)

ROLLBACK;

Verifying the hashes:

$ head -c65536 /dev/zero | md5sum
fcd6bcb56c1689fcef28b57c22475bad -
$ head -c65536 /dev/zero | sha256sum
de2f256064a0af797747c2b97505dc0b9f3df0de4f489eac731c23ae9ca9cc31 -

[1] https://www.postgresql.org/docs/15/plpython.html

--
Erik

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Wen Yi 2023-07-30 11:44:22 [Question]Can someone explain the context_freelists?
Previous Message Philip Warner 2023-07-29 13:49:57 Timeout in Logical Replication