From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: An improvement of ProcessTwoPhaseBuffer logic |
Date: | 2025-01-17 02:44:16 |
Message-ID: | 20250117024416.8c.nmisch@google.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jan 17, 2025 at 11:04:03AM +0900, Michael Paquier wrote:
> On Thu, Jan 16, 2025 at 04:52:21PM -0800, Noah Misch wrote:
> > In other words, the root problem that led to commits e358425 and 7e125b2 was
> > recovery interpreting pg_twophase file content before reaching consistency.
> > We can't make the ProcessTwoPhaseBuffer() checks safe before reaching
> > consistency. Is that right? Once ProcessTwoPhaseBuffer() stops running
> > before consistency, it won't strictly need the range checks. We may as well
> > have something like those range checks as a best-effort way to detect base
> > backup spec violations.
>
> Oh, yeah, it seems like you have a very good point here.
> ProcessTwoPhaseBuffer() is an effort to use a unique path for the
> handling of this 2PC data, and we cannot do that safely at this stage.
> The problem is restoreTwoPhaseData(), which is called very early in
Agreed.
> the recovery process, while all the other code paths calling
> ProcessTwoPhaseBuffer() are done when we're doing reading WAL after
> we've reached consistency as far as I can see. This is a wrong
> concept since 728bd991c3c4. 5a1dfde8334b has made that much harder to
> think about, while not integrating fully things. It also looks like
> we may be able to just get rid of the CLOG checks entirely if we
> overwrite existing files as long as we are not in a consistent state,
> as you say. Hmm.
>
> I agree that e358425 and 7e125b2 are weird attempts that just try to
> work around the real issue, and that they're making some cases wrong
> while on it with potential file removals.
>
> +typedef void (*TwoPhaseCallback) (FullTransactionId fxid, uint16 info,
> void *recdata, uint32 len);
>
> Based on your latest patch, I doubt that we'll be able to do any of
> that in the back-branches. That's much nicer in the long term to show
> what this code relies on.
The twophase.c bits of that patch turned out to be orthogonal to the key
issue, so that's fine. I suspect the signature changes would be okay to
back-patch to v17, subject to PGXN confirmation of no extension using those
APIs. (The TwoPhaseCallback list is hard-coded; extensions can't add
callbacks.) That said, the rationale for back-patching fxid stuff is gone.
> I've been thinking about the strategy to use here, and here are some
> rough steps:
> - Let's first revert e358425 and 7e125b2. This indeed needs to be
> reworked, and there is a release coming by.
+1. e358425 is borderline, but it's hard to rule out ways of it making
recovery unlink pg_twophase files improperly. Released code may also be doing
that, but that code's seven years of history suggests it does so infrequently
if at all. We don't have as much evidence about what the frequency would be
under e358425.
> - Your refactoring around xid8funcs.c is a good idea on its own. This
> can be an independent patch.
I got here on a yak shave for postgr.es/m/20250111214454.9a.nmisch@google.com.
I expect that project will still want FullTransactionIdFromAllowableAt(). If
so, I'll include it in that thread's patch series.
> - Let's figure out how much surgery is required with a focus on HEAD
> for now (I'm feeling that more non-backpatchable pieces are going to
> be needed here), then extract the relevant pieces that could be
> backpatchable. Hard to say if this is going to be doable at this
> stage, or even worth doing, but it will be possible to dig into the
> details once we're happy with the state of HEAD. My first intuition
> is that the risk of touching the back-branches may not be worth the
> potential reward based on the lack of field complaints (not seen
> customer cases, tbh): high risk, low reward.
Let's see how it turns out. My first intuition would be to target a
back-patch, because a corner-case PITR failure is the kind of thing that can
go unreported and yet be hard to forgive.
I'm not confident I could fix both that other thread's data loss bug and
$SUBJECT in time for the 2024-02 releases. (By $SUBJECT, I mean the
seven-year-old bug of interpreting pg_twophase before recovery consistency,
which has a higher chance of causing spurious recovery failure starting in
v17.) Would your or someone else like to fix $SUBJECT, before or after
2024-02 releases? I'd make time to review a patch.
> Another point to consider is if we'd better switch 2PC files to store
> a fxid rather than a xid.. Perhaps that's not necessary, but if we're
> using FullTransactionIds across the full stack of twophase.c that
> could make the whole better with even less xid <-> fxid translations
> in all these paths. There is always the counter-argument of the extra
> 4 bytes in the 2PC files and the records, though.
Yes, perhaps so. It could also be an option to store it only in the
pg_twophase file, not in the WAL record. Files are a lot rarer.
Similarly, I wondered if pg_twophase files should contain the LSN of the
XLOG_XACT_PREPARE record, which would make the file's oldness unambiguous in a
way fxid doesn't achieve. I didn't come up with a problem whose solution
requires it, though.
Thanks,
nm
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2025-01-17 03:03:43 | Re: convert libpgport's pqsignal() to a void function |
Previous Message | Richard Guo | 2025-01-17 02:23:50 | Re: Adding OLD/NEW support to RETURNING |