Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.

From: Georgy Shelkovy <g(dot)shelkovy(at)arenadata(dot)io>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.
Date: 2024-08-08 09:38:21
Message-ID: 188271723109777@mail.yandex.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


On second run I got bug

 

 

08.08.2024, 14:30, "Heikki Linnakangas" <hlinnaka(at)iki(dot)fi>:

On 08/08/2024 10:57, Georgy Shelkovy wrote:


 Unfortunately, the playback is not very stable, but sometimes it shoots.

 I added some commands to show last WAL rows

Thanks. I still haven't been able to reproduce it, but here's a theory:

When determining whether the target needs rewinding, pg_rewind looks at

the target's last checkpoint record, or if it's a standby, its

minRecoveryPoint. It's possible that standby2's minRecoveryPoint is

indeed before the point of divergence. That means it has replayed the

340 insert records, but all the changes are still only sitting in the

shared buffer cache. When you shut it down, those 340 inserts are gone

on standby2. When you restart it, they will be applied again from the WAL.

In that case, pg_rewind's conclusion that no rewind is needed is

correct. standby2 is strictly behind standby1, and could catch up

directly to it. However, when you restart standby2, it will first replay

the WAL it had streamed from master.

Can you show the full output of pg_controldata on all the servers,

please? In your latest snippet, you showed just the checkpoint

locations, but if just remove the "grep checkpoint | grep location"

filters, it would print the whole thing. I'm particularly interested in

the minRecoveryPoint on standby2, in the cases when it works and when it

doesn't.

I'm not sure what the right behavior would be if that's the issue.

Perhaps pg_rewind should truncate the WAL in standby2/pg_wal/ in that

case, so that when you start it up again, it would not replay the local

WAL but would connect to standby2 directly. Also, perhaps a fast

shutdown of a standby server should update minRecoveryPoint before exiting.

 
--

Heikki Linnakangas

Neon (https://neon.tech)

 

Attachment Content-Type Size
unknown_filename text/html 2.2 KB
r.log text/plain 32.1 KB
r.sh text/x-shellscript 4.9 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tender Wang 2024-08-08 09:46:55 Re: BUG #18568: BUG: Result wrong when do group by on partition table!
Previous Message Heikki Linnakangas 2024-08-08 09:29:53 Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.