Re: [PATCH] Add crc32(text) & crc32(bytea)

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Add crc32(text) & crc32(bytea)
Date: 2024-07-26 15:42:32
Message-ID: ZqPD6Jb3FLjPHaSm@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 26, 2024 at 12:01:40PM +0300, Aleksander Alekseev wrote:
>> This sounds generally reasonable to me, especially given the apparent
>> demand. Should we also introduce crc32c() while we're at it?
>
> Might be a good idea. However I didn't see a demand for crc32c() SQL
> function yet. Also I'm not sure whether the best interface for it
> would be crc32c() or crc32(x, version='c') or perhaps crc32(x,
> polinomial=...). I propose keeping the scope small this time.

I don't think adding crc32c() would sufficiently increase the scope. We'd
use the existing implementations for both crc32() and crc32c(). And
besides, this could be useful for adding tests for that code.

+ <function>crc32</function> ( <type>text</type> )

Do we need a version of the function that takes a text input? It's easy
enough to cast to a bytea.

+ <returnvalue>text</returnvalue>

My first reaction is that we should just have this return bytea like the
SHA ones do, if for no other reason than commit 10cfce3 seems intended to
move us away from returning text for these kinds of functions. Upthread,
you mentioned the possibility of returning a bigint, too. I think I'd
still prefer bytea in case we want to add, say, crc64() or crc16() in the
future. That would allow us to keep all of these functions consistent
instead of returning different types for each. However, I understand that
returning the numeric types might be more convenient. I'm curious what
others think about this.

+ Computes the CRC32 <link linkend="functions-hash-note">hash</link> of
+ the binary string, with the result written in hexadecimal.

I'm not sure we should call the check values "hashes." Wikipedia does
include them in the "List of hash functions" page [0], but it seems to
deliberately avoid calling them hashes in the CRC page [1]. I'd suggest
calling them "CRC32 values" instead.

[0] https://en.wikipedia.org/wiki/List_of_hash_functions
[1] https://en.wikipedia.org/wiki/Cyclic_redundancy_check

--
nathan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-07-26 15:51:42 Re: Assertion failure with summarize_wal enabled during pg_createsubscriber
Previous Message Junwang Zhao 2024-07-26 15:12:21 Re: Extension using Meson as build system