Re: BUG #17928: Standby fails to decode WAL on termination of primary

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Sergei Kornilov <sk(at)zsrv(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date: 2023-09-04 07:17:21
Message-ID: CA+hUKGLcT4ttqts4ow1=ZF9c+AwU=YfovfPs=r-Y2n0G-BunFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Sep 4, 2023 at 3:54 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Mon, Sep 04, 2023 at 03:20:31PM +1200, Thomas Munro wrote:
> > 1. In the place where we fail to allocate memory for an oversized
> > record, I copied the comment about treating that as a "bogus data"
> > condition. I suspect that we will soon be converting that to a FATAL
> > error[1], and that'll need to be done in both places.
>
> You mean for the two callers of XLogReadRecordAlloc(), even for the
> case where !allow_oversized? Using a FATAL on non-FRONTEND would be
> the quickest fix, indeed, but there are argument for standbys where we
> could let these continue, as well. That would be an improvement over
> the always-FATAL on OOM, of course.

I just mean the two places where "bogus data" is mentioned in that v5 patch.

> > But if you
> > want to be able to distinguish garbage from out-of-memory, and thereby
> > end-of-wal from a FATAL please-insert-more-RAM condition, I think
> > you'd really need this industrial strength validation in all affected
> > branches, and I'd have more work to do, right? The weak validation we
> > are fixing here is the *real* underlying problem going back many
> > years, right?
>
> Getting the same validation checks for all the branches would be nice.
> FATAL-ing on OOM to force recovery to happen again is a better option
> than assuming that it is the end of recovery. I am OK to provide
> patches for all the branches for the sake of this thread, if that
> helps. Switching to a hard FATAL on OOM for the WAL reader in the
> backend is backpatchable, but I'd rather consider that on a different
> thread once the better checks for the record header are in place.

OK, so it sounds like you want to go back to 12. Let me see if I can
get this TAP test to work in 12... more tomorrow.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2023-09-04 07:28:30 Re: BUG #17928: Standby fails to decode WAL on termination of primary
Previous Message Michael Paquier 2023-09-04 06:35:41 Re: BUG #17950: Incorrect memory access in gtsvector_picksplit()