Re: Should the archiver process always make sure that the timeline history files exist in the archive?

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: jyih(at)vmware(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Should the archiver process always make sure that the timeline history files exist in the archive?
Date: 2023-08-24 08:15:00
Message-ID: 20230824.171500.418533297821162665.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 16 Aug 2023 07:33:29 +0000, Jimmy Yih <jyih(at)vmware(dot)com> wrote in
> Hello pgsql-hackers,
>
> After doing some more debugging on the matter, I believe this issue might be a
> minor regression from commit 5332b8cec541. Prior to that commit, the archiver
> process when first started on a previously promoted primary would have all the
> timeline history files marked as ready for immediate archiving. If that had
> happened, none of my mentioned failure scenarios would be theoretically possible
> (barring someone manually deleting the timeline history files). With that in
> mind, I decided to look more into my Question 1 and created a patch proposal.
> The attached patch will try to archive the current timeline history file if it
> has not been archived yet when the archiver process starts up.

In essence, after taking a subtle but not necessarily wrong steps,
there's a case where a primary server lacks the timeline history file
for the current timeline in both pg_wal and archive, even if that
timeline is larger than 1. This primary can start, but a new standby
created form the primary cannot start streaming, as it can't fetch the
timeline history file for the initial TLI.

A. The OP suggests archiving the timeline history file for the current
timeline every time the archiver starts. However, I don't think we
want to keep archiving the same file over and over. (Granted, we're
not always perfect at avoiding that..)

B. Given that the steps valid, I concur to what is described in the
test script provided: standbys don't really need that history file
for the initial TLI (though I have yet to fully verify this). If the
walreceiver just overlooks a fetch error for this file, the standby
can successfully start. (Just skipping the first history file seems
to work, but it feels a tad aggressive to me.)

C. If those steps aren't valid, we might want to add a note stating
that -X none basebackups do need the timeline history file for the
initial TLI. And don't forget to enable archive mode before the
latest timeline switch if any.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2023-08-24 08:38:24 Re: Fix error handling in be_tls_open_server()
Previous Message Sergey Shinderuk 2023-08-24 08:11:49 Re: Fix error handling in be_tls_open_server()