From: | James Coleman <jtc331(at)gmail(dot)com> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | pg_rewind fails to detect timeline change |
Date: | 2022-05-16 21:02:06 |
Message-ID: | CAAaqYe8b2DBbooTprY4v=BiZEd9qBqVLq+FD9j617eQFjk1KvQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
During recent planned database failovers (from primary to synchronous
replica) we noticed an interesting result while trying to run pg_rewind
prior to reintroducing the former primary as a streaming replica:
source and target cluster are on the same timeline
no rewind required
That obviously can't be right given the new primary had a promote command
issued to it, and promotion increments the timeline.
Upon further investigation the 2nd time this happened I noticed that the
control data for the new primary looked like:
Latest checkpoint's TimeLineID: 4
Latest checkpoint's PrevTimeLineID: 4
...
Min recovery ending loc's timeline: 5
while the former primary had 4 for all three values.
After issuing a manual checkpoint to the new primary the control data looks
like:
Latest checkpoint's TimeLineID: 5
Latest checkpoint's PrevTimeLineID: 5
...
Min recovery ending loc's timeline: 0
I'm not sure why the last value is 0 (maybe it's really null?), but that's
a distraction here.
After that checkpoint running pg_rewind works as expected. It seems to me
that pg_rewind shouldn't be reporting that the timelines are the same when
they definitely are not. I'm guessing pg_rewind is looking at the control
data's latest checkpoint timeline ID rather than asking the streaming
replication protocol for the current timeline, though I haven't yet looked
at the code to verify that guess.
I believe we either need to be using the most up to date timeline or, if a
checkpoint is required, at least detecting this situation and reporting it
to the user rather than giving them incorrect information.
Note: this is Postgres 11 on Debian Stretch.
Thanks,
James Coleman
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2022-05-16 23:52:57 | Re: BUG #17481: sometime pg_stat_statements coredump |
Previous Message | Joe Hebert (jhebert) | 2022-05-16 18:48:08 | RE: [EXT] Re: BUG #17469: postgresql-11.service file not configured to match setup/initdb options specified |