Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Ants Aasma <ants(at)cybertec(dot)at>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: Substituting Checksum Algorithm (was: Enabling Checksums)
Date: 2013-04-30 17:05:30
Message-ID: 517FF9DA.7060504@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I re-ran the benchmark that's had me most worried against the committed
code and things look good so far. I've been keeping quiet because my
tests recently have all agreed with what Ants already described. This
is more a confirmation summary than new data.

The problem case has been Jeff's test 2 "worst-case overhead for
calculating checksum while reading data" from the OS cache. I wrapped
that into a test harness and gave results similar to Jeff's at
http://www.postgresql.org/message-id/5133D732.4090801@2ndQuadrant.com
based on the originally proposed Fletcher-16 checksum.

I made some system improvements since then such that the absolute
runtime improved for most of the tests I'm running. But the percentage
changes didn't seem off enough to bother re-running the Fletcher tests
again. Details are in attached spreadsheet, to summarize:

-The original Fletcher-16 code slowed this test case down 24 to 32%,
depending on whether you look at the average of 3 runs or the median.

-The initial checksum commit with the truncated WAL CRC was almost an
order of magnitude worse: 146% to 224% slowdown. The test case that
took ~830ms was taking as much as 2652ms with that method. I'm still
not sure why the first run of this test is always so much faster than
the second and third. But since it happens so often I think it's fair
to consider that worst case really important.

-Committed FNV-1a implementation is now slightly better than Fletcher-16
speed wise: 19 to 27% slowdown.

-Slicing by 8 CRC I didn't test because once I'd fully come around to
agree with Ants's position it didn't seem likely to be useful. I don't
want to lose track of that idea though, it might be the right path for a
future implementation with 32 bit checksums.

Since the >=25% slowdown on this test with Fletcher-16 turned into more
like a 2% drop on more mixed workloads, I'd expect we're back to where
that's again the case with the new FNV-1a. I plan to step back to
looking at more of those cases, but it will take a few days at least to
start sorting that out.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachment Content-Type Size
ChecksumMethods.xls application/vnd.ms-excel 28.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2013-04-30 17:12:14 Re: Remaining beta blockers
Previous Message Andres Freund 2013-04-30 16:21:19 Re: Remaining beta blockers