From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Enabling Checksums |
Date: | 2013-04-13 15:10:07 |
Message-ID: | 20130413151006.GC10556@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2013-04-13 10:58:53 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2013-04-13 09:14:26 -0400, Bruce Momjian wrote:
> >> As I understand it, SIMD is just a CPU-optimized method for producing a
> >> CRC checksum. Is that right? Does it produce the same result as a
> >> non-CPU-optimized CRC calculation?
>
> > No we are talking about a different algorithm that results in different
> > results, thats why its important to choose now since we can't change it
> > later without breaking pg_upgrade in further releases.
> > http://en.wikipedia.org/wiki/SIMD_%28hash_function%29
>
> [ squint... ] We're talking about a *cryptographic* hash function?
> Why in the world was this considered a good idea for page checksums?
In Ants' implementation its heck of a lot of faster than any CRC
implementation we have seen so far on relatively large blocks (like pages).
pgbench results:
CA+CSw_uXO-fRkuzL0Yzs0wSdL8dipZV-ugMvYN-yV45SGUBU2w(at)mail(dot)gmail(dot)com
byte/cycle comparison:
CA+CSw_su1fopLNBz1NAfkSNw4_=gv+5pf0KdLQmpvuKW1Q4v+Q(at)mail(dot)gmail(dot)com
> In the first place, it's probably not very fast compared to some
> alternatives, and in the second place, the criteria by which people
> would consider it a good crypto hash function have approximately nothing
> to do with what we need for a checksum function. What we want for a
> checksum function is high probability of detection of common hardware
> failure modes, such as burst errors and all-zeroes. This is
> particularly critical when we're going with only a 16-bit checksum ---
> the probabilities need to be skewed in the right direction, or it's not
> going to be all that terribly useful.
>
> CRCs are known to be good for that sort of thing; it's what they were
> designed for. I'd like to see some evidence that any substitute
> algorithm has similar properties. Without that, I'm going to vote
> against this idea.
Ants has dome some analysis on this, like
CA+CSw_tMoA85e=1vS4oMjZjG2MR_huLiKoVPd80Dp5RURDSGcQ(at)mail(dot)gmail(dot)com .
That doesn't look bad to me and unless I am missing something its better
than our CRC with 16bit.
So while I would say its not 100% researched there has been a rather
detailed investigation by Ants - I am rather impressed.
My biggest doubt so far is the reliance on inline assembly for the top
performance on x86-64 and a generic implementation otherwise that only
is really fast with appropriate compiler flags..
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Ants Aasma | 2013-04-13 15:14:28 | Re: Enabling Checksums |
Previous Message | Tom Lane | 2013-04-13 14:58:53 | Re: Enabling Checksums |