From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | David Steele <david(at)pgmasters(dot)net> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Subject: | Re: Race between KeepFileRestoredFromArchive() and restartpoint |
Date: | 2022-07-31 06:17:47 |
Message-ID: | 20220731061747.GA3692882@rfd.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jul 26, 2022 at 07:21:29AM -0400, David Steele wrote:
> On 6/19/21 16:39, Noah Misch wrote:
> >On Tue, Feb 02, 2021 at 07:14:16AM -0800, Noah Misch wrote:
> >>Recycling and preallocation are wasteful during archive recovery, because
> >>KeepFileRestoredFromArchive() unlinks every entry in its path. I propose to
> >>fix the race by adding an XLogCtl flag indicating which regime currently owns
> >>the right to add long-term pg_wal directory entries. In the archive recovery
> >>regime, the checkpointer will not preallocate and will unlink old segments
> >>instead of recycling them (like wal_recycle=off). XLogFileInit() will fail.
> >
> >Here's the implementation. Patches 1-4 suffice to stop the user-visible
> >ERROR. Patch 5 avoids a spurious LOG-level message and wasted filesystem
> >writes, and it provides some future-proofing.
> >
> >I was tempted to (but did not) just remove preallocation. Creating one file
> >per checkpoint seems tiny relative to the max_wal_size=1GB default, so I
> >expect it's hard to isolate any benefit. Under the old checkpoint_segments=3
> >default, a preallocated segment covered a respectable third of the next
> >checkpoint. Before commit 63653f7 (2002), preallocation created more files.
>
> This also seems like it would fix the link issues we are seeing in [1].
>
> I wonder if that would make it worth a back patch?
Perhaps. It's sad to have multiple people deep-diving into something fixed on
HEAD. On the other hand, I'm not eager to spend risk-of-backpatch points on
this. One alternative would be adding an errhint like "This is known to
happen occasionally during archive recovery, where it is harmless." That has
an unpolished look, but it's low-risk and may avoid deep-dive efforts.
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2022-07-31 06:51:56 | Re: remove more archiving overhead |
Previous Message | Michael Paquier | 2022-07-31 04:11:36 | Re: Collect ObjectAddress for ATTACH DETACH PARTITION to use in event trigger |