Re: 12.3 replicas falling over during WAL redo

From: Ben Chobot <bench(at)silentmedia(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: 12.3 replicas falling over during WAL redo
Date: 2020-08-03 20:26:59
Message-ID: c6b3f53a-b250-b30b-c1a9-e0e866cda13b@silentmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Peter Geoghegan wrote on 8/3/20 11:25 AM:
> On Sun, Aug 2, 2020 at 9:39 PM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>> All of the cited log lines seem suggesting relation with deleted btree
>> page items. As a possibility I can guess, that can happen if the pages
>> were flushed out during a vacuum after the last checkpoint and
>> full-page-writes didn't restored the page to the state before the
>> index-item deletion happened(that is, if full_page_writes were set to
>> off.). (If it found to be the cause, I'm not sure why that didn't
>> happen on 9.5.)
> There is also a Heap/HOT_UPDATE log line with similar errors.

Yes, and I have the pg_waldump output for it. But, that table is quite
large, and the transaction that contains the LSN in the error log is
1,752 waldump lines long. I'm happy to share what would be useful to
help debug it but I'm guessing it should be a subset of that.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ben Chobot 2020-08-03 20:49:55 Re: 12.3 replicas falling over during WAL redo
Previous Message Shankar Bhaskaran 2020-08-03 20:01:42 Configuring only SSL in postgres docker image