From: | Michael Banck <michael(dot)banck(at)credativ(dot)de> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Online verification of checksums |
Date: | 2018-07-26 11:59:33 |
Message-ID: | 1532606373.3422.5.camel@credativ.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
v10 almost added online activation of checksums, but all we've got is
pg_verify_checksums, i.e. offline verification of checkums.
However, we also got (online) checksum verification during base backups,
and I have ported/adapted David Steele's recheck code to my personal
fork of pg_checksums[1], removed the online check (for verification) and
that seems to work fine.
I've now forward-ported this change to pg_verify_checksums, in order to
make this application useful for online clusters, see attached patch.
I've tested this in a tight loop (while true; do pg_verify_checksums -D
data1 -d > /dev/null || /bin/true; done)[2] while doing "while true; do
createdb pgbench; pgbench -i -s 10 pgbench > /dev/null; dropdb pgbench;
done", which I already used to develop the original code in the fork and
which brought up a few bugs.
I got one checksums verification failure this way, all others were
caught by the recheck (I've introduced a 500ms delay for the first ten
failures) like this:
|pg_verify_checksums: checksum verification failed on first attempt in
|file "data1/base/16837/16850", block 7770: calculated checksum 785 but
|expected 5063
|pg_verify_checksums: block 7770 in file "data1/base/16837/16850"
|verified ok on recheck
However, I am also seeing sporadic (maybe 0.5 times per pgbench run)
failures like this:
|pg_verify_checksums: short read of block 2644 in file
|"data1/base/16637/16650", got only 4096 bytes
This is not strictly a verification failure, should we do anything about
this? In my fork, I am also rechecking on this[3] (and I am happy to
extend the patch that way), but that makes the code and the patch more
complicated and I wanted to check the general opinion on this case
first.
Michael
[1] https://github.com/credativ/pg_checksums/commit/dc052f0d6f1282d3c821
5b0eb28b8e7c4e74f9e5
[2] while patching out the somewhat unhelpful (in regular operation,
anyway) debug message for every successful checksum verification
[3] https://github.com/credativ/pg_checksums/blob/master/pg_checksums.c#
L160
--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de
credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment | Content-Type | Size |
---|---|---|
online-verification-of-checksums_V1.patch | text/x-patch | 4.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Etsuro Fujita | 2018-07-26 12:11:01 | Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled. |
Previous Message | Ashutosh Bapat | 2018-07-26 11:39:08 | Re: TupleTableSlot abstraction |