From: | Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | Re: What exactly is our CRC algorithm? |
Date: | 2014-10-08 22:23:59 |
Message-ID: | 5435B97F.3010700@archidevsys.co.nz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09/10/14 10:13, Andres Freund wrote:
> On 2014-10-08 22:13:46 +0300, Heikki Linnakangas wrote:
>> Hmm. So the simple, non-table driven, calculation gives the same result as
>> using the lookup table using the reflected lookup code. That's expected; the
>> lookup method is supposed to be the same, just faster. However, using the
>> "normal" lookup code, but with a "reflected" lookup table, produces the same
>> result as Postgres' algorithm. Indeed, that's what we do in PostgreSQL. But
>> AFAICS, that's an incorrect combination. You're supposed to the
>> non-reflected lookup table with the non-reflected lookup code; you can't mix
>> and match.
>>
>> As far as I can tell, PostgreSQL's so-called CRC algorithm doesn't
>> correspond to any bit-by-bit CRC variant and polynomial. My math skills are
>> not strong enough to determine what the consequences of that are. It might
>> still be a decent checksum. Or not. I couldn't tell if the good error
>> detection properties of the normal CRC-32 polynomial apply to our algorithm
>> or not.
> Additional interesting datapoints are that hstore and ltree contain the
> same tables - but properly use the reflected computation.
>
>> Thoughts?
> It clearly seems like a bad idea to continue with this - I don't think
> anybody here knows which guarantees this gives us.
>
> The question is how can we move away from this. There's unfortunately
> two places that embed PGC32 that are likely to prove problematic when
> fixing the algorithm: pg_trgm and tsgist both seem to include crc's in
> their logic in a persistent way. I think we should provide
> INIT/COMP/FIN_PG32 using the current algorithm for these.
>
> If we're switching to a saner computation, we should imo also switch to
> a better polynom - CRC-32C has better error detection capabilities than
> CRC32 and is available in hardware. As we're paying the price pf
> breaking compat anyway...
>
> Arguably we could also say that given that there's been little evident
> problems with the borked computation we could also switch to a much
> faster hash instead of continuing to use crc...
>
> Greetings,
>
> Andres Freund
>
Could a 64 bit variant of some kind be useful as an option - in addition
to a correct 32 bit?
As most people have 64 bit processors and storage is less constrained
now-a-days, as well as we tend to store much larger chunks of data.
Cheers,
Gavin
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2014-10-08 22:24:51 | Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} |
Previous Message | Peter Geoghegan | 2014-10-08 22:18:46 | Re: Promise index tuples for UPSERT |