Unfortunately, the playback is not very stable, but sometimes it shoots. I added some commands to show last WAL rows
script
```bash
#!/bin/sh -eux
 
# Goto script's directory
cd "$(dirname "$0")"
# Stop master if exists
test -f master/postmaster.pid && pg_ctl -w -D master stop -m fast || echo $?
# Remove master directory
rm -rf master
# Create master directory
mkdir -p master
# Initialize master, data checksums are mandatory
pg_ctl initdb -D master -o "--data-checksums -N -A trust --wal-segsize 1"
# Debug to see recycling WAL, keep only 5 WAL files (because segsize 1MB)
cat >>master/postgresql.conf <<EOF
hot_standby = on
logging_collector = on
log_min_messages = debug
#synchronous_standby_names = postgres
wal_keep_size = 5MB
EOF
# Accept replication connections on master
cat >> master/pg_hba.conf <<EOF
local replication all trust
host replication all 127.0.0.1/32 trust
host replication all ::1/128 trust
EOF
# Start master
pg_ctl -w -D master start -o "-p 5000"
# Stop standby1 if exists
test -f standby1/postmaster.pid && pg_ctl -w -D standby1 stop -m fast || echo $?
# Remove standby1 directory
rm -rf standby1
# Base backup is taken with xlog files included
pg_basebackup -D standby1 -p 5000 -X fetch --verbose --write-recovery-conf
# Start standby1
pg_ctl -w -D standby1 start -o "-p 5001"
# Stop standby2 if exists
test -f standby2/postmaster.pid && pg_ctl -w -D standby2 stop -m fast || echo $?
# Remove standby2 directory
rm -rf standby2
# Base backup is taken with xlog files included
pg_basebackup -D standby2 -p 5000 -X fetch --verbose --write-recovery-conf
# Start standby2
pg_ctl -w -D standby2 start -o "-p 5002"
# Create table and insert lot of rows
psql -d postgres -a --no-psqlrc -p 5000 <<EOF
create table a (a int, b int);
insert into a select i, i from generate_series(1, 100000) i;
EOF
# Stop master to recycle WAL files
pg_ctl -w -D master stop -m fast
# Stop standby1 to recycle WAL files
pg_ctl -w -D standby1 stop -m fast
# Stop standby2 to recycle WAL files
pg_ctl -w -D standby2 stop -m fast
# Start master
pg_ctl -w -D master start -o "-p 5000"
# Start standby1
pg_ctl -w -D standby1 start -o "-p 5001"
# Start standby2
pg_ctl -w -D standby2 start -o "-p 5002"
# Find latest master WAL file
LAST_MASTER_WAL_FILE="$(pg_controldata master | grep "Latest checkpoint's REDO WAL file:" | cut -d : -f 2 | grep -oP "\w+")"
# Now promote standby1
pg_ctl -w -D standby1 promote
# Wait until standby1 is promoted
while ! pg_isready -p 5001; do sleep 1; done
# Find latest standby1 WAL file
LAST_STANDBY_WAL_FILE="$(pg_controldata standby1 | grep "Latest checkpoint's REDO WAL file:" | cut -d : -f 2 | grep -oP "\w+")"
# Print last timeline history
cat "$(find standby1/pg_wal -name "*.history" -type f -print0 | xargs -r -0 ls -1 -t | head -1)"
# Print control information of master
pg_controldata master | grep "checkpoint" | grep "location"
# Print control information of standby1
pg_controldata standby1 | grep "checkpoint" | grep "location"
# Print control information of standby2
pg_controldata standby2 | grep "checkpoint" | grep "location"
# Print last 10 lines from dumped master WAL file
pg_waldump -p master "$LAST_MASTER_WAL_FILE" | tail -n 10
# Print last 10 lines from dumped standby1 WAL files
pg_waldump -p standby1 "$LAST_MASTER_WAL_FILE" | tail -n 10
pg_waldump -p standby1 "$LAST_STANDBY_WAL_FILE" | tail -n 10
# Print last 10 lines from dumped standby2 WAL file
pg_waldump -p standby2 "$LAST_MASTER_WAL_FILE" | tail -n 10
# Insert some rows
psql -d postgres -a --no-psqlrc -p 5000 <<EOF
--set synchronous_standby_names = 'postgres';
--select pg_reload_conf();
insert into a select i, i from generate_series(1, 340) i;
--commit;
EOF
# Standby2 node need to be stopped as well.
pg_ctl -w -D standby2 stop -m fast
# Stop the master after standby promotion
pg_ctl -w -D master stop -m fast
# Standby1 node need to be stopped as well.
pg_ctl -w -D standby1 stop -m fast
# Print control information of master
pg_controldata master | grep "checkpoint" | grep "location"
# Print control information of standby1
pg_controldata standby1 | grep "checkpoint" | grep "location"
# Print control information of standby2
pg_controldata standby2 | grep "checkpoint" | grep "location"
# Print last 10 lines from dumped master WAL file
pg_waldump -p master "$LAST_MASTER_WAL_FILE" | tail -n 10
# Print last 10 lines from dumped standby1 WAL files
pg_waldump -p standby1 "$LAST_MASTER_WAL_FILE" | tail -n 10
pg_waldump -p standby1 "$LAST_STANDBY_WAL_FILE" | tail -n 10
# Print last 10 lines from dumped standby2 WAL file
pg_waldump -p standby2 "$LAST_MASTER_WAL_FILE" | tail -n 10
# Do rewind using standby1 pgdata as source
pg_rewind --progress --debug --source-pgdata=standby1 --target-pgdata=standby2
# Parameters for standby2 postgresql.conf
cat >> standby2/postgresql.conf <<EOF
primary_conninfo = 'port=5001'
EOF
# Place standby2 signal file
touch standby2/standby.signal
# Start standby1
pg_ctl -w -D standby1 start -o "-p 5001"
# Start standby2
pg_ctl -w -D standby2 start -o "-p 5002"
# Same query
psql -d postgres -a --no-psqlrc -p 5001 <<EOF
select count(*) from a;
EOF
# Different results
psql -d postgres -a --no-psqlrc -p 5002 <<EOF
select count(*) from a;
EOF
# Stop standby1
pg_ctl -w -D standby1 stop -m fast
# Stop standby2
pg_ctl -w -D standby2 stop -m fast
```
output
```bash
+ dirname /var/lib/postgresql/ADBDEV/5716.1/r16_2.sh
+ cd /var/lib/postgresql/ADBDEV/5716.1
+ test -f master/postmaster.pid
+ echo 1
1
+ rm -rf master
+ mkdir -p master
+ pg_ctl initdb -D master -o '--data-checksums -N -A trust --wal-segsize 1'
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
 
The database cluster will be initialized with locale "ru_RU.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "russian".
 
Data page checksums are enabled.
 
fixing permissions on existing directory master ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Asia/Yekaterinburg
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
 
Sync to disk skipped.
The data directory might become corrupt if the operating system crashes.
 
Success. You can now start the database server using:
 
    /usr/local/bin/pg_ctl -D master -l logfile start
 
+ cat
+ cat
+ pg_ctl -w -D master start -o '-p 5000'
waiting for server to start....2024-08-08 12:52:03.032 +05 [1911] DEBUG:  registering background worker "logical replication launcher"
2024-08-08 12:52:03.032 +05 [1911] DEBUG:  mmap(146800640) with MAP_HUGETLB failed, huge pages disabled: Out of memory
2024-08-08 12:52:03.035 +05 [1911] DEBUG:  dynamic shared memory system will support 674 segments
2024-08-08 12:52:03.035 +05 [1911] DEBUG:  created dynamic shared memory control segment 1884847300 (26976 bytes)
2024-08-08 12:52:03.035 +05 [1911] DEBUG:  max_safe_fds = 986, usable_fds = 1000, already_open = 4
2024-08-08 12:52:03.035 +05 [1911] LOG:  redirecting log output to logging collector process
2024-08-08 12:52:03.035 +05 [1911] HINT:  Future log output will appear in directory "log".
 done
server started
+ test -f standby1/postmaster.pid
+ echo 1
1
+ rm -rf standby1
+ pg_basebackup -D standby1 -p 5000 -X fetch --verbose --write-recovery-conf
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/800028 on timeline 1
pg_basebackup: write-ahead log end point: 0/800100
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed
+ pg_ctl -w -D standby1 start -o '-p 5001'
waiting for server to start....2024-08-08 12:52:03.886 +05 [1924] DEBUG:  registering background worker "logical replication launcher"
2024-08-08 12:52:03.886 +05 [1924] DEBUG:  mmap(146800640) with MAP_HUGETLB failed, huge pages disabled: Out of memory
2024-08-08 12:52:03.888 +05 [1924] DEBUG:  dynamic shared memory system will support 674 segments
2024-08-08 12:52:03.889 +05 [1924] DEBUG:  created dynamic shared memory control segment 689671304 (26976 bytes)
2024-08-08 12:52:03.889 +05 [1924] DEBUG:  max_safe_fds = 986, usable_fds = 1000, already_open = 4
2024-08-08 12:52:03.889 +05 [1924] LOG:  redirecting log output to logging collector process
2024-08-08 12:52:03.889 +05 [1924] HINT:  Future log output will appear in directory "log".
 done
server started
+ test -f standby2/postmaster.pid
+ echo 1
1
+ rm -rf standby2
+ pg_basebackup -D standby2 -p 5000 -X fetch --verbose --write-recovery-conf
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/900028 on timeline 1
pg_basebackup: write-ahead log end point: 0/900100
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed
+ pg_ctl -w -D standby2 start -o '-p 5002'
waiting for server to start....2024-08-08 12:52:04.870 +05 [1936] DEBUG:  registering background worker "logical replication launcher"
2024-08-08 12:52:04.870 +05 [1936] DEBUG:  mmap(146800640) with MAP_HUGETLB failed, huge pages disabled: Out of memory
2024-08-08 12:52:04.873 +05 [1936] DEBUG:  dynamic shared memory system will support 674 segments
2024-08-08 12:52:04.873 +05 [1936] DEBUG:  created dynamic shared memory control segment 111286480 (26976 bytes)
2024-08-08 12:52:04.873 +05 [1936] DEBUG:  max_safe_fds = 986, usable_fds = 1000, already_open = 4
2024-08-08 12:52:04.874 +05 [1936] LOG:  redirecting log output to logging collector process
2024-08-08 12:52:04.874 +05 [1936] HINT:  Future log output will appear in directory "log".
 done
server started
+ psql -d postgres -a --no-psqlrc -p 5000
create table a (a int, b int);
CREATE TABLE
insert into a select i, i from generate_series(1, 100000) i;
INSERT 0 100000
+ pg_ctl -w -D master stop -m fast
waiting for server to shut down....2024-08-08 12:52:05.333 +05 [1912] DEBUG:  logger shutting down
 done
server stopped
+ pg_ctl -w -D standby1 stop -m fast
waiting for server to shut down....2024-08-08 12:52:05.483 +05 [1925] DEBUG:  logger shutting down
 done
server stopped
+ pg_ctl -w -D standby2 stop -m fast
waiting for server to shut down....2024-08-08 12:52:05.585 +05 [1937] DEBUG:  logger shutting down
 done
server stopped
+ pg_ctl -w -D master start -o '-p 5000'
waiting for server to start....2024-08-08 12:52:05.628 +05 [1952] DEBUG:  registering background worker "logical replication launcher"
2024-08-08 12:52:05.628 +05 [1952] DEBUG:  mmap(146800640) with MAP_HUGETLB failed, huge pages disabled: Out of memory
2024-08-08 12:52:05.631 +05 [1952] DEBUG:  dynamic shared memory system will support 674 segments
2024-08-08 12:52:05.631 +05 [1952] DEBUG:  created dynamic shared memory control segment 1418826688 (26976 bytes)
2024-08-08 12:52:05.631 +05 [1952] DEBUG:  max_safe_fds = 986, usable_fds = 1000, already_open = 4
2024-08-08 12:52:05.632 +05 [1952] LOG:  redirecting log output to logging collector process
2024-08-08 12:52:05.632 +05 [1952] HINT:  Future log output will appear in directory "log".
 done
server started
+ pg_ctl -w -D standby1 start -o '-p 5001'
waiting for server to start....2024-08-08 12:52:05.736 +05 [1962] DEBUG:  registering background worker "logical replication launcher"
2024-08-08 12:52:05.736 +05 [1962] DEBUG:  mmap(146800640) with MAP_HUGETLB failed, huge pages disabled: Out of memory
2024-08-08 12:52:05.739 +05 [1962] DEBUG:  dynamic shared memory system will support 674 segments
2024-08-08 12:52:05.739 +05 [1962] DEBUG:  created dynamic shared memory control segment 1087487228 (26976 bytes)
2024-08-08 12:52:05.739 +05 [1962] DEBUG:  max_safe_fds = 986, usable_fds = 1000, already_open = 4
2024-08-08 12:52:05.739 +05 [1962] LOG:  redirecting log output to logging collector process
2024-08-08 12:52:05.739 +05 [1962] HINT:  Future log output will appear in directory "log".
 done
server started
+ pg_ctl -w -D standby2 start -o '-p 5002'
waiting for server to start....2024-08-08 12:52:05.841 +05 [1971] DEBUG:  registering background worker "logical replication launcher"
2024-08-08 12:52:05.841 +05 [1971] DEBUG:  mmap(146800640) with MAP_HUGETLB failed, huge pages disabled: Out of memory
2024-08-08 12:52:05.844 +05 [1971] DEBUG:  dynamic shared memory system will support 674 segments
2024-08-08 12:52:05.844 +05 [1971] DEBUG:  created dynamic shared memory control segment 3045047604 (26976 bytes)
2024-08-08 12:52:05.845 +05 [1971] DEBUG:  max_safe_fds = 986, usable_fds = 1000, already_open = 4
2024-08-08 12:52:05.845 +05 [1971] LOG:  redirecting log output to logging collector process
2024-08-08 12:52:05.845 +05 [1971] HINT:  Future log output will appear in directory "log".
 done
server started
+ pg_controldata master
+ grep 'Latest checkpoint'"'"'s REDO WAL file:'
+ cut -d : -f+  2grep
 -oP '\w+'
+ LAST_MASTER_WAL_FILE=000000010000000000000010
+ pg_ctl -w -D standby1 promote
waiting for server to promote.... done
server promoted
+ pg_isready -p 5001
/tmp:5001 - accepting connections
+ pg_controldata standby1
+ grep 'Latest checkpoint'"'"'s REDO WAL file:'
+ cut -d : -f 2
+ grep -oP '\w+'
+ LAST_STANDBY_WAL_FILE=000000020000000000000010
+ find standby1/pg_wal -name '*.history' -type f -print0
+ xargs -r -0 ls -1 -t
+ head -1
+ cat standby1/pg_wal/00000002.history
1    0/10392E8    no recovery target specified
+ pg_controldata master
+ grep checkpoint
+ grep location
Latest checkpoint location:           0/1039270
Latest checkpoint's REDO location:    0/1039270
+ pg_controldata standby1
+ grep checkpoint
+ grep location
Latest checkpoint location:           0/1039350
Latest checkpoint's REDO location:    0/1039318
+ pg_controldata standby2
+ grep checkpoint
+ grep location
Latest checkpoint location:           0/1039270
Latest checkpoint's REDO location:    0/1039270
+ pg_waldump -p master 000000010000000000000010
+ tail -n 10
pg_waldump: error: error in WAL record at 0/1039270: invalid record length at 0/10392E8: expected at least 24, got 0
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039048, prev 0/01039008, desc: INSERT off: 101, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039088, prev 0/01039048, desc: INSERT off: 102, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/010390C8, prev 0/01039088, desc: INSERT off: 103, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039108, prev 0/010390C8, desc: INSERT off: 104, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039148, prev 0/01039108, desc: INSERT off: 105, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039188, prev 0/01039148, desc: INSERT off: 106, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/010391C8, prev 0/01039188, desc: INSERT off: 107, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039208, prev 0/010391C8, desc: INSERT off: 108, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Transaction len (rec/tot):     34/    34, tx:        731, lsn: 0/01039248, prev 0/01039208, desc: COMMIT 2024-08-08 12:52:05.194389 +05
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/01039270, prev 0/01039248, desc: CHECKPOINT_SHUTDOWN redo 0/1039270; tli 1; prev tli 1; fpw true; xid 0:732; oid 16387; multi 1; offset 0; oldest xid 722 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
+ pg_waldump -p standby1 000000010000000000000010
+ tail -n 10
pg_waldump: error: error in WAL record at 0/1039270: invalid record length at 0/10392E8: expected at least 24, got 0
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039048, prev 0/01039008, desc: INSERT off: 101, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039088, prev 0/01039048, desc: INSERT off: 102, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/010390C8, prev 0/01039088, desc: INSERT off: 103, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039108, prev 0/010390C8, desc: INSERT off: 104, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039148, prev 0/01039108, desc: INSERT off: 105, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039188, prev 0/01039148, desc: INSERT off: 106, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/010391C8, prev 0/01039188, desc: INSERT off: 107, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039208, prev 0/010391C8, desc: INSERT off: 108, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Transaction len (rec/tot):     34/    34, tx:        731, lsn: 0/01039248, prev 0/01039208, desc: COMMIT 2024-08-08 12:52:05.194389 +05
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/01039270, prev 0/01039248, desc: CHECKPOINT_SHUTDOWN redo 0/1039270; tli 1; prev tli 1; fpw true; xid 0:732; oid 16387; multi 1; offset 0; oldest xid 722 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
+ pg_waldump -p standby1 000000020000000000000010
+ tail -n 10
pg_waldump: error: error in WAL record at 0/10393C8: unexpected pageaddr 0/1038000 in WAL segment 000000020000000000000010, LSN 0/103A000, offset 237568
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039148, prev 0/01039108, desc: INSERT off: 105, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039188, prev 0/01039148, desc: INSERT off: 106, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/010391C8, prev 0/01039188, desc: INSERT off: 107, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039208, prev 0/010391C8, desc: INSERT off: 108, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Transaction len (rec/tot):     34/    34, tx:        731, lsn: 0/01039248, prev 0/01039208, desc: COMMIT 2024-08-08 12:52:05.194389 +05
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/01039270, prev 0/01039248, desc: CHECKPOINT_SHUTDOWN redo 0/1039270; tli 1; prev tli 1; fpw true; xid 0:732; oid 16387; multi 1; offset 0; oldest xid 722 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
rmgr: XLOG        len (rec/tot):     42/    42, tx:          0, lsn: 0/010392E8, prev 0/01039270, desc: END_OF_RECOVERY tli 2; prev tli 1; time 2024-08-08 12:52:05.946660 +05
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/01039318, prev 0/010392E8, desc: RUNNING_XACTS nextXid 732 latestCompletedXid 731 oldestRunningXid 732
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/01039350, prev 0/01039318, desc: CHECKPOINT_ONLINE redo 0/1039318; tli 2; prev tli 2; fpw true; xid 0:732; oid 16387; multi 1; offset 0; oldest xid 722 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 732; online
rmgr: XLOG        len (rec/tot):     49/   885, tx:          0, lsn: 0/010393C8, prev 0/01039350, desc: FPI_FOR_HINT , blkref #0: rel 1663/5/1259 blk 0 FPW
+ pg_waldump -p standby2 000000010000000000000010
+ tail -n 10
pg_waldump: error: error in WAL record at 0/1039270: invalid record length at 0/10392E8: expected at least 24, got 0
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039048, prev 0/01039008, desc: INSERT off: 101, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039088, prev 0/01039048, desc: INSERT off: 102, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/010390C8, prev 0/01039088, desc: INSERT off: 103, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039108, prev 0/010390C8, desc: INSERT off: 104, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039148, prev 0/01039108, desc: INSERT off: 105, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039188, prev 0/01039148, desc: INSERT off: 106, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/010391C8, prev 0/01039188, desc: INSERT off: 107, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039208, prev 0/010391C8, desc: INSERT off: 108, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Transaction len (rec/tot):     34/    34, tx:        731, lsn: 0/01039248, prev 0/01039208, desc: COMMIT 2024-08-08 12:52:05.194389 +05
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/01039270, prev 0/01039248, desc: CHECKPOINT_SHUTDOWN redo 0/1039270; tli 1; prev tli 1; fpw true; xid 0:732; oid 16387; multi 1; offset 0; oldest xid 722 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
+ psql -d postgres -a --no-psqlrc -p 5000
--set synchronous_standby_names = 'postgres';
--select pg_reload_conf();
insert into a select i, i from generate_series(1, 340) i;
INSERT 0 340
--commit;
+ pg_ctl -w -D standby2 stop -m fast
waiting for server to shut down....2024-08-08 12:52:06.096 +05 [1972] DEBUG:  logger shutting down
 done
server stopped
+ pg_ctl -w -D master stop -m fast
waiting for server to shut down....2024-08-08 12:52:06.203 +05 [1953] DEBUG:  logger shutting down
 done
server stopped
+ pg_ctl -w -D standby1 stop -m fast
waiting for server to shut down....2024-08-08 12:52:06.302 +05 [1963] DEBUG:  logger shutting down
 done
server stopped
+ pg_controldata master
+ grep checkpoint
+ grep location
Latest checkpoint location:           0/1041370
Latest checkpoint's REDO location:    0/1041370
+ pg_controldata standby1
+ grep checkpoint
+ grep location
Latest checkpoint location:           0/103B2F0
Latest checkpoint's REDO location:    0/103B2F0
+ pg_controldata standby2
+ grep checkpoint
+ grep location
Latest checkpoint location:           0/1039270
Latest checkpoint's REDO location:    0/1039270
+ pg_waldump -p master 000000010000000000000010
+ tail -n 10
pg_waldump: error: error in WAL record at 0/1041370: invalid record length at 0/10413E8: expected at least 24, got 0
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/01041148, prev 0/01041108, desc: INSERT off: 215, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/01041188, prev 0/01041148, desc: INSERT off: 216, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/010411C8, prev 0/01041188, desc: INSERT off: 217, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/01041208, prev 0/010411C8, desc: INSERT off: 218, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/01041248, prev 0/01041208, desc: INSERT off: 219, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/01041288, prev 0/01041248, desc: INSERT off: 220, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/010412C8, prev 0/01041288, desc: INSERT off: 221, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/01041308, prev 0/010412C8, desc: INSERT off: 222, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Transaction len (rec/tot):     34/    34, tx:        732, lsn: 0/01041348, prev 0/01041308, desc: COMMIT 2024-08-08 12:52:06.085591 +05
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/01041370, prev 0/01041348, desc: CHECKPOINT_SHUTDOWN redo 0/1041370; tli 1; prev tli 1; fpw true; xid 0:733; oid 16387; multi 1; offset 0; oldest xid 722 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
+ pg_waldump -p standby1 000000010000000000000010
+ tail -n 10
pg_waldump: error: error in WAL record at 0/1039270: invalid record length at 0/10392E8: expected at least 24, got 0
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039048, prev 0/01039008, desc: INSERT off: 101, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039088, prev 0/01039048, desc: INSERT off: 102, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/010390C8, prev 0/01039088, desc: INSERT off: 103, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039108, prev 0/010390C8, desc: INSERT off: 104, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039148, prev 0/01039108, desc: INSERT off: 105, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039188, prev 0/01039148, desc: INSERT off: 106, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/010391C8, prev 0/01039188, desc: INSERT off: 107, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039208, prev 0/010391C8, desc: INSERT off: 108, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Transaction len (rec/tot):     34/    34, tx:        731, lsn: 0/01039248, prev 0/01039208, desc: COMMIT 2024-08-08 12:52:05.194389 +05
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/01039270, prev 0/01039248, desc: CHECKPOINT_SHUTDOWN redo 0/1039270; tli 1; prev tli 1; fpw true; xid 0:732; oid 16387; multi 1; offset 0; oldest xid 722 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
+ pg_waldump -p standby1 000000020000000000000010
+ tail -n 10
pg_waldump: error: error in WAL record at 0/103B2F0: invalid record length at 0/103B368: expected at least 24, got 0
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/010391C8, prev 0/01039188, desc: INSERT off: 107, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Heap        len (rec/tot):     63/    63, tx:        731, lsn: 0/01039208, prev 0/010391C8, desc: INSERT off: 108, flags: 0x00, blkref #0: rel 1663/5/16384 blk 442
rmgr: Transaction len (rec/tot):     34/    34, tx:        731, lsn: 0/01039248, prev 0/01039208, desc: COMMIT 2024-08-08 12:52:05.194389 +05
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/01039270, prev 0/01039248, desc: CHECKPOINT_SHUTDOWN redo 0/1039270; tli 1; prev tli 1; fpw true; xid 0:732; oid 16387; multi 1; offset 0; oldest xid 722 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
rmgr: XLOG        len (rec/tot):     42/    42, tx:          0, lsn: 0/010392E8, prev 0/01039270, desc: END_OF_RECOVERY tli 2; prev tli 1; time 2024-08-08 12:52:05.946660 +05
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/01039318, prev 0/010392E8, desc: RUNNING_XACTS nextXid 732 latestCompletedXid 731 oldestRunningXid 732
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/01039350, prev 0/01039318, desc: CHECKPOINT_ONLINE redo 0/1039318; tli 2; prev tli 2; fpw true; xid 0:732; oid 16387; multi 1; offset 0; oldest xid 722 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 732; online
rmgr: XLOG        len (rec/tot):     49/   885, tx:          0, lsn: 0/010393C8, prev 0/01039350, desc: FPI_FOR_HINT , blkref #0: rel 1663/5/1259 blk 0 FPW
rmgr: XLOG        len (rec/tot):     49/  7061, tx:          0, lsn: 0/01039740, prev 0/010393C8, desc: FPI_FOR_HINT , blkref #0: rel 1663/5/1249 blk 17 FPW
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/0103B2F0, prev 0/01039740, desc: CHECKPOINT_SHUTDOWN redo 0/103B2F0; tli 2; prev tli 2; fpw true; xid 0:732; oid 16387; multi 1; offset 0; oldest xid 722 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
+ pg_waldump -p standby2 000000010000000000000010
+ tail -n 10
pg_waldump: error: error in WAL record at 0/1041348: invalid record length at 0/1041370: expected at least 24, got 0
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/01041108, prev 0/010410C8, desc: INSERT off: 214, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/01041148, prev 0/01041108, desc: INSERT off: 215, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/01041188, prev 0/01041148, desc: INSERT off: 216, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/010411C8, prev 0/01041188, desc: INSERT off: 217, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/01041208, prev 0/010411C8, desc: INSERT off: 218, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/01041248, prev 0/01041208, desc: INSERT off: 219, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/01041288, prev 0/01041248, desc: INSERT off: 220, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/010412C8, prev 0/01041288, desc: INSERT off: 221, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Heap        len (rec/tot):     63/    63, tx:        732, lsn: 0/01041308, prev 0/010412C8, desc: INSERT off: 222, flags: 0x00, blkref #0: rel 1663/5/16384 blk 443
rmgr: Transaction len (rec/tot):     34/    34, tx:        732, lsn: 0/01041348, prev 0/01041308, desc: COMMIT 2024-08-08 12:52:06.085591 +05
+ pg_rewind --progress --debug '--source-pgdata=standby1' '--target-pgdata=standby2'
pg_rewind: Source timeline history:
pg_rewind: Target timeline history:
pg_rewind: 1: 0/0 - 0/0
pg_rewind: servers diverged at WAL location 0/10392E8 on timeline 1
pg_rewind: no rewind required
+ cat
+ touch standby2/standby.signal
+ pg_ctl -w -D standby1 start -o '-p 5001'
waiting for server to start....2024-08-08 12:52:06.432 +05 [2044] DEBUG:  registering background worker "logical replication launcher"
2024-08-08 12:52:06.432 +05 [2044] DEBUG:  mmap(146800640) with MAP_HUGETLB failed, huge pages disabled: Out of memory
2024-08-08 12:52:06.436 +05 [2044] DEBUG:  dynamic shared memory system will support 674 segments
2024-08-08 12:52:06.436 +05 [2044] DEBUG:  created dynamic shared memory control segment 8407908 (26976 bytes)
2024-08-08 12:52:06.436 +05 [2044] DEBUG:  max_safe_fds = 986, usable_fds = 1000, already_open = 4
2024-08-08 12:52:06.436 +05 [2044] LOG:  redirecting log output to logging collector process
2024-08-08 12:52:06.436 +05 [2044] HINT:  Future log output will appear in directory "log".
 done
server started
+ pg_ctl -w -D standby2 start -o '-p 5002'
waiting for server to start....2024-08-08 12:52:06.539 +05 [2054] DEBUG:  registering background worker "logical replication launcher"
2024-08-08 12:52:06.539 +05 [2054] DEBUG:  mmap(146800640) with MAP_HUGETLB failed, huge pages disabled: Out of memory
2024-08-08 12:52:06.542 +05 [2054] DEBUG:  dynamic shared memory system will support 674 segments
2024-08-08 12:52:06.542 +05 [2054] DEBUG:  created dynamic shared memory control segment 615788588 (26976 bytes)
2024-08-08 12:52:06.542 +05 [2054] DEBUG:  max_safe_fds = 986, usable_fds = 1000, already_open = 4
2024-08-08 12:52:06.543 +05 [2054] LOG:  redirecting log output to logging collector process
2024-08-08 12:52:06.543 +05 [2054] HINT:  Future log output will appear in directory "log".
 done
server started
+ psql -d postgres -a --no-psqlrc -p 5001
select count(*) from a;
 count  
--------
 100000
(1 row)
 
+ psql -d postgres -a --no-psqlrc -p 5002
select count(*) from a;
 count  
--------
 100340
(1 row)
 
+ pg_ctl -w -D standby1 stop -m fast
waiting for server to shut down....2024-08-08 12:52:06.792 +05 [2045] DEBUG:  logger shutting down
 done
server stopped
+ pg_ctl -w -D standby2 stop -m fast
waiting for server to shut down....2024-08-08 12:52:06.830 +05 [2055] DEBUG:  logger shutting down
 done
server stopped
```
postgres is compiled from last REL_16_STABLE branch


08.08.2024, 12:03, "Heikki Linnakangas" <hlinnaka@iki.fi>:

On 07/08/2024 15:19, PG Bug reporting form wrote:

 Sometimes pg_rewind mistakenly assumes that nothing needs to be done, which
 results in the replica having data that is not on the master.
 
 ...
 + pg_rewind --progress --debug '--source-pgdata=standby1'
 '--target-pgdata=standby2'
 pg_rewind: Source timeline history:
 pg_rewind: Target timeline history:
 pg_rewind: 1: 0/0 - 0/0
 pg_rewind: servers diverged at WAL location 0/10392E8 on timeline 1
 pg_rewind: no rewind required

 ...
 Same query
 + psql -d postgres -a --no-psqlrc -p 5001
 select count(*) from a;
   count
 --------
   100000
 (1 row)
 
 + echo 'Different results'
 Different results
 + psql -d postgres -a --no-psqlrc -p 5002
 select count(*) from a;
   count
 --------
   100340
 (1 row)
 
 + echo 'Stop standby1'
 Stop standby1
 + pg_ctl -w -D standby1 stop -m fast
 waiting for server to shut down....2024-08-07 17:15:03.077 +05 [19703]
 DEBUG: logger shutting down
   done
 server stopped
 + echo 'Stop standby2'
 Stop standby2
 + pg_ctl -w -D standby2 stop -m fast
 waiting for server to shut down....2024-08-07 17:15:03.117 +05 [19713]
 DEBUG: logger shutting down
   done
 server stopped
 ```


I cannot reproduce this. On my laptop, the reported "servers diverged at
WAL location" is always higher, and it performs rewind as expected:

g_rewind: Source timeline history:
pg_rewind: Target timeline history:
pg_rewind: 1: 0/0 - 0/0
pg_rewind: servers diverged at WAL location 0/1138F00 on timeline 1
pg_rewind: rewinding from last common checkpoint at 0/1138E88 on timeline 1

However while looking at the code, I noticed that the debug-output of
the Source timeline history is broken. See attached patch to fix it.
With that fix, I get:

pg_rewind: Source timeline history:
pg_rewind: 1: 0/0 - 0/1138F00
pg_rewind: 2: 0/1138F00 - 0/0
pg_rewind: Target timeline history:
pg_rewind: 1: 0/0 - 0/0
pg_rewind: 0 a: 1 0/0-0/1138F00 - b: 1 0/0-0/0
pg_rewind: servers diverged at WAL location 0/1138F00 on timeline 1
pg_rewind: rewinding from last common checkpoint at 0/1138E88 on timeline 1

That just fixes the debugging output, though, it doesn't change what it
actually does.

Can you try the attached fix show the output you get with that please?
Or alternatively, show the contents of the
standby1/pg_wal/00000002.history file.

--
Heikki Linnakangas
Neon (https://neon.tech)