Quick Links

Re: 12.3 replicas falling over during WAL redo

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Ben Chobot <bench(at)silentmedia(dot)com>
Cc:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: 12.3 replicas falling over during WAL redo
Date:	2020-08-01 16:35:51
Message-ID:	20200801163551.GA12860@alvherre.pgsql
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 2020-Aug-01, Ben Chobot wrote:

> We have a few hundred postgres servers in AWS EC2, all of which do streaming
> replication to at least two replicas. As we've transitioned our fleet to
> from 9.5 to 12.3, we've noticed an alarming increase in the frequency of a
> streaming replica dying during replay. Postgres will log something like:
>
> |2020-07-31T16:55:22.602488+00:00 hostA postgres[31875]: [19137-1] db=,user=
> LOG: restartpoint starting: time 2020-07-31T16:55:24.637150+00:00 hostA
> postgres[24076]: [15754-1] db=,user= FATAL: incorrect index offsets supplied
> 2020-07-31T16:55:24.637261+00:00 hostA postgres[24076]: [15754-2] db=,user=
> CONTEXT: WAL redo at BCC/CB7AF8B0 for Btree/VACUUM: lastBlockVacuumed 1720
> 2020-07-31T16:55:24.642877+00:00 hostA postgres[24074]: [8-1] db=,user= LOG:
> startup process (PID 24076) exited with exit code 1|

I've never seen this one.

Can you find out what the index is being modified by those LSNs -- is it
always the same index? Can you have a look at nearby WAL records that
touch the same page of the same index in each case?

One possibility is that the storage forgot a previous write.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

12.3 replicas falling over during WAL redo at 2020-08-01 14:55:27 from Ben Chobot

Responses

Re: 12.3 replicas falling over during WAL redo at 2020-08-01 16:58:05 from Ben Chobot

Browse pgsql-general by date

	From	Date	Subject
Next Message	Ben Chobot	2020-08-01 16:58:05	Re: 12.3 replicas falling over during WAL redo
Previous Message	Ben Chobot	2020-08-01 14:55:27	12.3 replicas falling over during WAL redo