Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: Substituting Checksum Algorithm (was: Enabling Checksums)
Date: 2013-04-23 08:44:24
Message-ID: CA+CSw_temxBkeyQYykJfLea11LvTT0vYVjp4y6sVjnAraVWbRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Apr 23, 2013 10:17 AM, "Jeff Davis" <pgsql(at)j-davis(dot)com> wrote:
> Attached is my reorganization of Ants's patch here:
>
> http://www.postgresql.org/message-id/CA
> +CSw_vinyf-w45i=M1m__MpJZY=e8S4Nt_KNnpEbtWjTOaSUA(at)mail(dot)gmail(dot)com

Thanks for your help. Some notes below.

> My changes:
>
> * wrest the core FNV algorithm from the specific concerns of a data page
> - PageCalcChecksum16 mixes the block number, reduces to 16 bits,
> and avoids the pd_checksum field
> - the checksum algorithm is just a pure block checksum with a 32-bit
> result
> * moved the FNV checksum into a separate file, checksum.c

I think the function should not be called checksum_fnv as it implies that
we use the well known straightforward implementation. Maybe checksum_block
or some other generic name.

> * added Ants's suggested compilation flags for better optimization

-msse4.1 is not safe to use in default builds. On the other hand it doesn't
hurt to just specify it in CFLAGS for the whole compile (possibly as
-march=native). We should just omit it and mention somewhere that SSE4.1
enabled builds will have better checksum performance.

> * slight update to the paragraph in the README that discusses concerns
> specific to a data page
>
> I do have a couple questions/concerns about Ants's patch though:
>
> * The README mentions a slight bias; does that come from the mod
> (2^16-1)? That's how I updated the README, so I wanted to make sure.

Yes.

> * How was the FNV_PRIME chosen?

I still haven't found the actual source for this value. It's specified as
the value to use for 32bit FNV-1a.

> * I can't match the individual algorithm step as described in the README
> to the actual code. And the comments in the README don't make it clear
> enough which one is right (or maybe they both are, and I'm just tired).

I will try to reword.

> The README says:
>
> hash = (hash ^ value) * ((hash ^ value) >> 3)
>
> (which is obviously missing the FNV_PRIME factor) and the code says:
>
> +#define CHECKSUM_COMP(checksum, value) do {\
> + uint32 __tmp = (checksum) ^ (value);\
> + (checksum) = __tmp * FNV_PRIME ^ (__tmp >> 3);\
> +} while (0)
>
> I'm somewhat on the fence about the "shift right". It was discussed in
> this part of the thread:
>
>
http://www.postgresql.org/message-id/99343716-5F5A-45C8-B2F6-74B9BA357396@phlo.org
>
> I think we should be able to state with a little more clarity in the
> README why there is a problem with plain FNV-1a, and why this
> modification is both effective and safe.

Florian already mentioned why it's effective. I have an intuition why it's
safe, will try to come up with a well reasoned argument.

Regards,
Antd Aasma

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-04-23 08:47:38 Re: Substituting Checksum Algorithm (was: Enabling Checksums)
Previous Message Florian Pflug 2013-04-23 08:24:40 Re: Substituting Checksum Algorithm (was: Enabling Checksums)