Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-07 03:02:58
Message-ID: 20220207.120258.310426179780547983.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 07 Feb 2022 10:16:34 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> At Fri, 4 Feb 2022 10:59:04 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
> > On 2022/02/03 15:50, Kyotaro Horiguchi wrote:
> > > By the way, restart point should start only while recoverying, and at
> > > the timeof the start both checkpoint.redo and checkpoint LSN are
> > > already past. We shouldn't update minRecovery point after promotion,
> > > but is there any reason for not updating the checkPoint and
> > > checkPointCopy? If we update them after promotion, the
> > > which-LSN-to-show problem would be gone.
> >
> > I tried to find the reason by reading the past discussion, but have
> > not found that yet.
> >
> > If we update checkpoint and REDO LSN at pg_control in that case, we
> > also need to update min recovery point at pg_control? Otherwise the
> > min recovery point at pg_control still indicates the old LSN that
> > previous restart point set.
>
> I had an assuption that the reason I think it shouldn't update
> minRecoveryPoint is that it has been or is going to be reset to
> invalid LSN by promotion and the checkpoint should refrain from
> touching it.

Hmm.. It doesn't seem to be the case. If a server crashes just after
promotion and before requesting post-promtion checkpoint,
minRecoveryPoint stays at a valid LSN.

(Promoted at 0/7000028)
Database cluster state: in production
Latest checkpoint location: 0/6000060
Latest checkpoint's REDO location: 0/6000028
Latest checkpoint's REDO WAL file: 000000010000000000000006
Minimum recovery ending location: 0/7000090
Min recovery ending loc's timeline: 2

minRecoveryPoint/TLI are ignored in any case where a server in
in-production state is started. In other words, the values are
useless. There's no clear or written reason for unrecording the last
ongoing restartpoint after promotion.

Before fast-promotion was introduced, we shouldn't get there after
end-of-recovery checkpoint (but somehow reached sometimes?) but it is
quite normal nowadays. Or to the contrary, we're expecting it to
happen and it is regarded as a normal checkponit. So we should do
there nowadays are as the follows.

- If any later checkpoint/restartpoint has been established, just skip
remaining task then return false. (!chkpt_was_latest)
(I'm not sure this can happen, though.)

- we update control file only when archive recovery is still ongoing.

- Otherwise reset minRecoveryPoint then continue.

Do you have any thoughts or opinions?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-02-07 03:04:14 Re: GUC flags
Previous Message Michael Paquier 2022-02-07 02:53:41 Re: pg_receivewal - couple of improvements