Re: Run end-of-recovery checkpoint in non-wait mode or skip it entirely for faster server availability?

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Run end-of-recovery checkpoint in non-wait mode or skip it entirely for faster server availability?
Date: 2022-03-25 17:29:32
Message-ID: FF923252-1631-4B16-A3A1-36909D6F644F@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On March 25, 2022 9:56:38 AM PDT, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>On Fri, Mar 25, 2022 at 3:40 AM Bharath Rupireddy
><bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>> Since the server spins up checkpointer process [1] while the startup
>> process performs recovery, isn't it a good idea to make
>> end-of-recovery completely optional for the users or at least run it
>> in non-wait mode so that the server will be available faster. The next
>> checkpointer cycle will take care of performing the EOR checkpoint
>> work, if user chooses to skip the EOR or the checkpointer will run EOR
>> checkpoint in background, if user chooses to run it in the non-wait
>> mode (without CHECKPOINT_WAIT flag). Of course by choosing this
>> option, users must be aware of the fact that the extra amount of
>> recovery work that needs to be done if a crash happens from the point
>> EOR gets skipped or runs in non-wait mode until the next checkpoint.
>> But the advantage that users get is the faster server availability.
>
>I think that we should remove end-of-recovery checkpoints completely
>and instead use the end-of-recovery WAL record (cf.
>CreateEndOfRecoveryRecord). However, when I tried to do that, I ran
>into some problems:
>
>http://postgr.es/m/CA+TgmobrM2jvkiccCS9NgFcdjNSgAvk1qcAPx5S6F+oJT3D2mQ@mail.gmail.com
>
>The second problem described in that email has subsequently been
>fixed, I believe, but the first one remains.

Seems we could deal with that by making latestCompleted a 64bit xid? Then there never are cases where we have to retreat back into such early xids?

A random note from a conversation with Thomas a few days ago: We still perform timeline increases with checkpoints in some cases. Might be worth fixing as a step towards just using EOR.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-03-25 17:31:26 Re: Corruption during WAL replay
Previous Message David G. Johnston 2022-03-25 17:11:14 Re: Re: pg_dump new feature: exporting functions only. Bad or good idea ?