On second run I got bug
 
 
08.08.2024, 14:30, "Heikki Linnakangas" <hlinnaka@iki.fi>:

On 08/08/2024 10:57, Georgy Shelkovy wrote:

 Unfortunately, the playback is not very stable, but sometimes it shoots.
 I added some commands to show last WAL rows


Thanks. I still haven't been able to reproduce it, but here's a theory:

When determining whether the target needs rewinding, pg_rewind looks at
the target's last checkpoint record, or if it's a standby, its
minRecoveryPoint. It's possible that standby2's minRecoveryPoint is
indeed before the point of divergence. That means it has replayed the
340 insert records, but all the changes are still only sitting in the
shared buffer cache. When you shut it down, those 340 inserts are gone
on standby2. When you restart it, they will be applied again from the WAL.

In that case, pg_rewind's conclusion that no rewind is needed is
correct. standby2 is strictly behind standby1, and could catch up
directly to it. However, when you restart standby2, it will first replay
the WAL it had streamed from master.

Can you show the full output of pg_controldata on all the servers,
please? In your latest snippet, you showed just the checkpoint
locations, but if just remove the "grep checkpoint | grep location"
filters, it would print the whole thing. I'm particularly interested in
the minRecoveryPoint on standby2, in the cases when it works and when it
doesn't.

I'm not sure what the right behavior would be if that's the issue.
Perhaps pg_rewind should truncate the WAL in standby2/pg_wal/ in that
case, so that when you start it up again, it would not replay the local
WAL but would connect to standby2 directly. Also, perhaps a fast
shutdown of a standby server should update minRecoveryPoint before exiting.
 

--
Heikki Linnakangas
Neon (https://neon.tech)