Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: g(dot)shelkovy(at)arenadata(dot)io, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.
Date: 2024-08-08 07:03:29
Message-ID: 092dd515-b7b4-4fd0-8407-ceca2f02f6ec@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 07/08/2024 15:19, PG Bug reporting form wrote:
> Sometimes pg_rewind mistakenly assumes that nothing needs to be done, which
> results in the replica having data that is not on the master.
>
> ...
> + pg_rewind --progress --debug '--source-pgdata=standby1'
> '--target-pgdata=standby2'
> pg_rewind: Source timeline history:
> pg_rewind: Target timeline history:
> pg_rewind: 1: 0/0 - 0/0
> pg_rewind: servers diverged at WAL location 0/10392E8 on timeline 1
> pg_rewind: no rewind required
>
> ...
> Same query
> + psql -d postgres -a --no-psqlrc -p 5001
> select count(*) from a;
> count
> --------
> 100000
> (1 row)
>
> + echo 'Different results'
> Different results
> + psql -d postgres -a --no-psqlrc -p 5002
> select count(*) from a;
> count
> --------
> 100340
> (1 row)
>
> + echo 'Stop standby1'
> Stop standby1
> + pg_ctl -w -D standby1 stop -m fast
> waiting for server to shut down....2024-08-07 17:15:03.077 +05 [19703]
> DEBUG: logger shutting down
> done
> server stopped
> + echo 'Stop standby2'
> Stop standby2
> + pg_ctl -w -D standby2 stop -m fast
> waiting for server to shut down....2024-08-07 17:15:03.117 +05 [19713]
> DEBUG: logger shutting down
> done
> server stopped
> ```

I cannot reproduce this. On my laptop, the reported "servers diverged at
WAL location" is always higher, and it performs rewind as expected:

g_rewind: Source timeline history:
pg_rewind: Target timeline history:
pg_rewind: 1: 0/0 - 0/0
pg_rewind: servers diverged at WAL location 0/1138F00 on timeline 1
pg_rewind: rewinding from last common checkpoint at 0/1138E88 on timeline 1

However while looking at the code, I noticed that the debug-output of
the Source timeline history is broken. See attached patch to fix it.
With that fix, I get:

pg_rewind: Source timeline history:
pg_rewind: 1: 0/0 - 0/1138F00
pg_rewind: 2: 0/1138F00 - 0/0
pg_rewind: Target timeline history:
pg_rewind: 1: 0/0 - 0/0
pg_rewind: 0 a: 1 0/0-0/1138F00 - b: 1 0/0-0/0
pg_rewind: servers diverged at WAL location 0/1138F00 on timeline 1
pg_rewind: rewinding from last common checkpoint at 0/1138E88 on timeline 1

That just fixes the debugging output, though, it doesn't change what it
actually does.

Can you try the attached fix show the output you get with that please?
Or alternatively, show the contents of the
standby1/pg_wal/00000002.history file.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachment Content-Type Size
fix-print-debug-source-history.patch text/x-patch 519 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message jian he 2024-08-08 07:51:41 Re: BUG #18568: BUG: Result wrong when do group by on partition table!
Previous Message PG Bug reporting form 2024-08-08 03:19:03 BUG #18577: Miss out part invalItems in saved plansource of one CallStmt