Quick Links

Re: pg_rewind: warn when checkpoint hasn't happened after promotion

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	James Coleman <jtc331(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pg_rewind: warn when checkpoint hasn't happened after promotion
Date:	2022-06-04 13:39:41
Message-ID:	CALj2ACXtdvJKvdJyOJ4D5aVx-tkOW3isKwtuDLBKeS6pfXvH0g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, Jun 4, 2022 at 6:29 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>
> A few weeks back I sent a bug report [1] directly to the -bugs mailing
> list, and I haven't seen any activity on it (maybe this is because I
> emailed directly instead of using the form?), but I got some time to
> take a look and concluded that a first-level fix is pretty simple.
>
> A quick background refresher: after promoting a standby rewinding the
> former primary requires that a checkpoint have been completed on the
> new primary after promotion. This is correctly documented. However
> pg_rewind incorrectly reports to the user that a rewind isn't
> necessary because the source and target are on the same timeline.
>
> Specifically, this happens when the control file on the newly promoted
> server looks like:
>
> Latest checkpoint's TimeLineID: 4
> Latest checkpoint's PrevTimeLineID: 4
> ...
> Min recovery ending loc's timeline: 5
>
> Attached is a patch that detects this condition and reports it as an
> error to the user.
>
> In the spirit of the new-ish "ensure shutdown" functionality I could
> imagine extending this to automatically issue a checkpoint when this
> situation is detected. I haven't started to code that up, however,
> wanting to first get buy-in on that.
>
> 1: https://www.postgresql.org/message-id/CAAaqYe8b2DBbooTprY4v=BiZEd9qBqVLq+FD9j617eQFjk1KvQ@mail.gmail.com

Thanks. I had a quick look over the issue and patch - just a thought -
can't we let pg_rewind issue a checkpoint on the new primary instead
of erroring out, maybe optionally? It might sound too much, but helps
pg_rewind to be self-reliant i.e. avoiding external actor to detect
the error and issue checkpoint the new primary to be able to
successfully run pg_rewind on the pld primary and repair it to use it
as a new standby.

Regards,
Bharath Rupireddy.

In response to

pg_rewind: warn when checkpoint hasn't happened after promotion at 2022-06-04 12:59:12 from James Coleman

Responses

Re: pg_rewind: warn when checkpoint hasn't happened after promotion at 2022-06-06 05:26:02 from Kyotaro Horiguchi
Re: pg_rewind: warn when checkpoint hasn't happened after promotion at 2022-06-06 12:10:19 from James Coleman

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Justin Pryzby	2022-06-04 14:13:46	Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch
Previous Message	James Coleman	2022-06-04 12:59:12	pg_rewind: warn when checkpoint hasn't happened after promotion