From: | Michael Banck <michael(dot)banck(at)credativ(dot)de> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: [PATCH] Verify Checksums during Basebackups |
Date: | 2018-03-22 23:05:51 |
Message-ID: | 1521759951.15036.21.camel@credativ.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Magnus,
thanks a lot for looking at my patch!
Am Donnerstag, den 22.03.2018, 15:07 +0100 schrieb Magnus Hagander:
> On Sat, Mar 17, 2018 at 10:34 PM, Michael Banck <michael(dot)banck(at)credativ(dot)de> wrote:
> > On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote:
> > > Possibly open questions:
> > >
> > > 1. I have not so far changed the replication protocol to make verifying
> > > checksums optional. I can go about that next if the consensus is that we
> > > need such an option (and cannot just check it everytime)?
> >
> > I think most people (including those I had off-list discussions about
> > this with) were of the opinion that such an option should be there, so I
> > added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP
> > replication command and also an option -k / --no-verify-checksums to
> > pg_basebackup to trigger this.
> >
> > Updated patch attached
> >
>
> Notes:
>
> + if (checksum_failure == true)
>
> Really, just if(checksum_failure)
>
> + errmsg("checksum mismatch during basebackup")));
>
> Should be "base backup" in messages.
I've changed both.
> +static const char *skip[] = {
>
> I think that needs a much better name than just "skip". Skip for what?
> In particular since we are just skipping it for checksums, and not for
> the actual basebackup, that name is actively misinforming.
I have copied that verbatim from the online checksum patch, but of
course this is in src/backend/replication and not src/bin so warrants
more scrutiny. If you plan to commit both for v11, it might make sense
to have that separated out to a more central place?
But I guess what we mean is a test for "is a heap file". Do you have a
good suggestion where it should end up so that pg_verify_checksums can
use it as well?
In the meantime, I've changed the skip[] array to no_heap_files[] and
the skipfile() function to is_heap_file(), also reversing the logic. If
it helps pg_verify_checksums, we could make is_not_a_heap_file()
instead.
> + filename = basename(pstrdup(readfilename));
> + if (!noverify_checksums && DataChecksumsEnabled() &&
> + !skipfile(filename) &&
> + (strncmp(readfilename, "./global/", 9) == 0 ||
> + strncmp(readfilename, "./base/", 7) == 0 ||
> + strncmp(readfilename, "/", 1) == 0))
> + verify_checksum = true;
>
> I would include the checks for global, base etc into the skipfile()
> function as well (also renamed).
Check. I had to change the way (the previous) skipfile() works a bit,
because it was expecting a filename as argument, while we check
pathnames in the above.
> + * Only check pages which have not been modified since the
> + * start of the base backup.
>
> I think this needs a description of why, as well (without having read
> this thread, this is a pretty subtle case).
I tried to expand on this some more.
> +system_or_bail 'dd', 'conv=notrunc', 'oflag=seek_bytes', 'seek=4000', 'bs=9', 'count=1', 'if=/dev/zero', "of=$pgdata/$pg_class";
>
> This part of the test will surely fail on Windows, not having a
> /dev/zero. Can we easily implement this part natively in perl perhaps?
Right, this was one of the open questions. I now came up with a perl 4-
liner that seems to do the trick, but I can't test it on Windows.
> Most of that stuff is trivial and can be cleaned up at commit time. Do
> you want to send an updated patch with a few of those fixes, or should
> I clean it?
I've attached a new patch, but I have not addressed the question whether
skipfile()/is_heap_file() should be moved somewhere else yet.
I found one more cosmetic issue: if there is an external tablespace, and
pg_basebackup encounters corruption, you would get the message
"pg_basebackup: changes to tablespace directories will not be undone"
from cleanup_directories_atexit(), which I now also suppress in case of
checksum failures.
> The test thing is a stopper until we figure that one out though. And
> while at it -- it seems we don't have any tests for the checksum
> feature in general. It would probably make sense to consider that at
> the same time as figuring out the right way to do this one.
I don't want to deflect work, but it seems to me the online checksums
patch would be in a better position to generally test checksums while
it's at it. Or did you mean something related to pg_basebackup?
Michael
--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de
credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Attachment | Content-Type | Size |
---|---|---|
basebackup-verify-checksum-V4.patch | text/x-patch | 16.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2018-03-22 23:13:16 | Re: [HACKERS] MERGE SQL Statement for PG11 |
Previous Message | Dmitry Dolgov | 2018-03-22 22:25:02 | Re: [HACKERS] [PATCH] Generic type subscripting |