From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | David Steele <david(at)pgmasters(dot)net> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Don Seiler <don(at)seiler(dot)us> |
Subject: | Re: Race between KeepFileRestoredFromArchive() and restartpoint |
Date: | 2022-08-04 06:24:56 |
Message-ID: | 20220804062456.GA3844351@rfd.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Aug 02, 2022 at 07:37:27AM -0700, Noah Misch wrote:
> On Tue, Aug 02, 2022 at 10:14:22AM -0400, David Steele wrote:
> > On 7/31/22 02:17, Noah Misch wrote:
> > >On Tue, Jul 26, 2022 at 07:21:29AM -0400, David Steele wrote:
> > >>On 6/19/21 16:39, Noah Misch wrote:
> > >>>On Tue, Feb 02, 2021 at 07:14:16AM -0800, Noah Misch wrote:
> > >>>>Recycling and preallocation are wasteful during archive recovery, because
> > >>>>KeepFileRestoredFromArchive() unlinks every entry in its path. I propose to
> > >>>>fix the race by adding an XLogCtl flag indicating which regime currently owns
> > >>>>the right to add long-term pg_wal directory entries. In the archive recovery
> > >>>>regime, the checkpointer will not preallocate and will unlink old segments
> > >>>>instead of recycling them (like wal_recycle=off). XLogFileInit() will fail.
> > >>>
> > >>>Here's the implementation. Patches 1-4 suffice to stop the user-visible
> > >>>ERROR. Patch 5 avoids a spurious LOG-level message and wasted filesystem
> > >>>writes, and it provides some future-proofing.
> > >>>
> > >>>I was tempted to (but did not) just remove preallocation. Creating one file
> > >>>per checkpoint seems tiny relative to the max_wal_size=1GB default, so I
> > >>>expect it's hard to isolate any benefit. Under the old checkpoint_segments=3
> > >>>default, a preallocated segment covered a respectable third of the next
> > >>>checkpoint. Before commit 63653f7 (2002), preallocation created more files.
> > >>
> > >>This also seems like it would fix the link issues we are seeing in [1].
> > >>
> > >>I wonder if that would make it worth a back patch?
> > >
> > >Perhaps. It's sad to have multiple people deep-diving into something fixed on
> > >HEAD. On the other hand, I'm not eager to spend risk-of-backpatch points on
> > >this. One alternative would be adding an errhint like "This is known to
> > >happen occasionally during archive recovery, where it is harmless." That has
> > >an unpolished look, but it's low-risk and may avoid deep-dive efforts.
> >
> > I think in this case a HINT might be sufficient to at least keep people from
> > wasting time tracking down a problem that has already been fixed.
Here's a patch to add that HINT. I figure it's better to do this before next
week's minor releases. In the absence of objections, I'll push this around
2022-08-05 14:00 UTC.
> > However, there is another issue [1] that might argue for a back patch if
> > this patch (as I believe) would fix the issue.
>
> > [1] https://www.postgresql.org/message-id/CAHJZqBDxWfcd53jm0bFttuqpK3jV2YKWx%3D4W7KxNB4zzt%2B%2BqFg%40mail.gmail.com
>
> That makes sense. Each iteration of the restartpoint recycle loop has a 1/N
> chance of failing. Recovery adds >N files between restartpoints. Hence, the
> WAL directory grows without bound. Is that roughly the theory in mind?
On further reflection, I don't expect it to happen that way. Each failure
message is LOG-level, so the remaining recycles still happen. (The race
condition can yield an ERROR under PreallocXlogFiles(), but recycling is
already done at that point.)
Attachment | Content-Type | Size |
---|---|---|
XLogFileInit6-hint-v1.patch | text/plain | 2.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2022-08-04 06:28:05 | Re: Introduce wait_for_subscription_sync for TAP tests |
Previous Message | Thomas Munro | 2022-08-04 06:07:35 | Re: Cygwin cleanup |