From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: a very minor bug and a couple of comment changes for basebackup.c |
Date: | 2023-03-02 07:59:35 |
Message-ID: | ZABXZ/mfr40VLK7w@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Feb 02, 2023 at 03:23:51PM -0500, Robert Haas wrote:
> 0001 fixes what I believe to be a slight logical error in sendFile(),
> introduced by me during the v15 development cycle when I introduced
> the bbsink abstraction. I believe that it is theoretically possible
> for this to cause an assertion failure, although the chances of that
> actually happening seem extremely remote in practice. I don't believe
> there are any consequences worse than that; for instance, I don't
> think this can result in your backup getting corrupted. See the
> proposed commit message for full details. Because I believe that this
> is formally a bug, I am inclined to think that this should be
> back-patched, but I also think it's fairly likely that no one would
> ever notice if we didn't. However, patch authors have been known to be
> wrong about the consequences of their own bugs from time to time, so
> please do let me know if this seems more serious to you than what I'm
> indicating, or conversely if you think it's not a problem at all for
> some reason.
Seems right, I think that you should backpatch that as
VERIFY_CHECKSUMS is the default.
> 0002 removes an old comment from the file that I find useless and
> slightly misleading.
Okay.
> 0003 rewrites a comment about the way that we verify checksums during
> backups. If we get a checksum mismatch, we reread the block and see if
> the perceived problem goes away. If it doesn't, then we report it.
> This is intended as protection against the backup reading a block
> while some other process is in the midst of writing it, but there's no
> guarantee that any concurrent write will finish quickly enough for our
> second read attempt to see the updated contents.
There is more to it: the page LSN is checked before its checksum.
Hence, if the page's LSN is corrupted in a way that it is higher than
sink->bbs_state->startptr, the checksum verification is just skipped
while the page is broken but not reported as such. (Spoiler: this has
been mentioned in the past, and maybe we'd better remove this stuff in
its entirety.)
> The comment claims
> otherwise, and that's false, and I'm getting tired of reading this
> false claim every time I read this code, so I rewrote the comment to
> say what I believe to be true, namely, that our algorithm is flaky and
> we have no good way to fix that right now. I'm pretty sure that Andres
> pointed this problem out when this feature was under discussion, but
> somehow it's still like this. There's another nearby comment which is
> also false or at least misleading for basically the same reasons which
> probably should be rewritten too, but I was a bit less certain how to
> rewrite it and it wasn't making me as annoyed as this one, so for now
> I only rewrote the one.
Indeed, that's an improvement ;)
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | shiy.fnst@fujitsu.com | 2023-03-02 08:06:53 | RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
Previous Message | Drouvot, Bertrand | 2023-03-02 07:39:14 | Re: Generate pg_stat_get_xact*() functions with Macros |