From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Sergei Kornilov <sk(at)zsrv(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17928: Standby fails to decode WAL on termination of primary |
Date: | 2023-09-04 07:17:21 |
Message-ID: | CA+hUKGLcT4ttqts4ow1=ZF9c+AwU=YfovfPs=r-Y2n0G-BunFA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Mon, Sep 4, 2023 at 3:54 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Mon, Sep 04, 2023 at 03:20:31PM +1200, Thomas Munro wrote:
> > 1. In the place where we fail to allocate memory for an oversized
> > record, I copied the comment about treating that as a "bogus data"
> > condition. I suspect that we will soon be converting that to a FATAL
> > error[1], and that'll need to be done in both places.
>
> You mean for the two callers of XLogReadRecordAlloc(), even for the
> case where !allow_oversized? Using a FATAL on non-FRONTEND would be
> the quickest fix, indeed, but there are argument for standbys where we
> could let these continue, as well. That would be an improvement over
> the always-FATAL on OOM, of course.
I just mean the two places where "bogus data" is mentioned in that v5 patch.
> > But if you
> > want to be able to distinguish garbage from out-of-memory, and thereby
> > end-of-wal from a FATAL please-insert-more-RAM condition, I think
> > you'd really need this industrial strength validation in all affected
> > branches, and I'd have more work to do, right? The weak validation we
> > are fixing here is the *real* underlying problem going back many
> > years, right?
>
> Getting the same validation checks for all the branches would be nice.
> FATAL-ing on OOM to force recovery to happen again is a better option
> than assuming that it is the end of recovery. I am OK to provide
> patches for all the branches for the sake of this thread, if that
> helps. Switching to a hard FATAL on OOM for the WAL reader in the
> backend is backpatchable, but I'd rather consider that on a different
> thread once the better checks for the record header are in place.
OK, so it sounds like you want to go back to 12. Let me see if I can
get this TAP test to work in 12... more tomorrow.
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2023-09-04 07:28:30 | Re: BUG #17928: Standby fails to decode WAL on termination of primary |
Previous Message | Michael Paquier | 2023-09-04 06:35:41 | Re: BUG #17950: Incorrect memory access in gtsvector_picksplit() |