Re: 12.3 replicas falling over during WAL redo

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: alvherre(at)2ndquadrant(dot)com
Cc: bench(at)silentmedia(dot)com, pgsql-general(at)postgresql(dot)org
Subject: Re: 12.3 replicas falling over during WAL redo
Date: 2020-08-05 05:42:13
Message-ID: 20200805.144213.1618481147898303093.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

At Tue, 4 Aug 2020 09:53:36 -0400, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> On 2020-Aug-03, Alvaro Herrera wrote:
>
> > >      lsn      | checksum | flags | lower | upper | special | pagesize |
> > > version | prune_xid
> > > --------------+----------+-------+-------+-------+---------+----------+---------+-----------
> > >  A0A/99BA11F8 |     -215 |     0 |   180 |  7240 |    8176 |     8192
> > > |       4 |         0
> > >
> > > As I understand what we're looking at, this means the WAL stream was
> > > assuming this page was last touched by A0A/AB2C43D0, but the page itself
> > > thinks it was last touched by A0A/99BA11F8, which means at least one write
> > > to the page is missing?
> >
> > Yeah, that's exactly what we're seeing. Somehow an older page version
> > was resurrected. Of course, this should never happen.
>
> ... although, the block should have been in shared buffers, and it is
> there that the previous WAL record would have updated -- not necessarily
> flushed to disk.

Yeah. On the other hand, the WAL records shown upthread desn't have a FPW.

> rmgr: Btree len (rec/tot): 72/ 72, tx: 76393394, lsn:
> A0A/AB2C43D0, prev A0A/AB2C4378, desc: INSERT_LEAF off 41, blkref #0: rel
> 16605/16613/60529051 blk 6501

> rmgr: Btree len (rec/tot): 72/ 72, tx: 76396065, lsn:
> A0A/AC4204A0, prev A0A/AC420450, desc: INSERT_LEAF off 48, blkref #0: rel
> 16605/16613/60529051 blk 6501

There must be a record for the page 6501 conveying FPW after the last
checkpoint. If it is not found, something wrong on deciding whether
to attach FPW.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Pavel Stehule 2020-08-05 09:28:49 Re: Track pgsql steps
Previous Message fight.mufasa 2020-08-05 03:25:13 Re:ERROR: XX000: cannot update SecondarySnapshot during a parallel operation