Re: archive status ".ready" files may be created too early

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bossartn(at)amazon(dot)com
Cc: masao(dot)fujii(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: archive status ".ready" files may be created too early
Date: 2019-12-24 02:08:07
Message-ID: 20191224.110807.612496031301992971.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Sat, 21 Dec 2019 01:18:24 +0000, "Bossart, Nathan" <bossartn(at)amazon(dot)com> wrote in
> On 12/18/19, 8:34 AM, "Bossart, Nathan" <bossartn(at)amazon(dot)com> wrote:
> > On 12/17/19, 2:26 AM, "Kyotaro Horiguchi" <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> >> If so, we could amend also that case by marking the last segment as
> >> .ready when XLogWrite writes the first bytes of the next segment. (As
> >> the further corner-case, it still doesn't work if a contination record
> >> spans over trhee or more segments.. But I don't (or want not to) think
> >> we don't need to consider that case..)
> >
> > I'm working on a new version of the patch that will actually look at
> > the WAL page metadata to determine when it is safe to mark a segment
> > as ready for archival. It seems relatively easy to figure out whether
> > a page is the last one for the current WAL record.
>
> I stand corrected. My attempts to add logic to check the WAL records
> added quite a bit more complexity than seemed reasonable to maintain
> in this code path. For example, I didn't anticipate things like
> XLOG_SWITCH records.
>
> I am still concerned about the corner case you noted, but I have yet
> to find a practical way to handle it. You suggested waiting until
> writing the first bytes of the next segment before marking a segment
> as ready, but I'm not sure that fixes this problem either, and I
> wonder if it could result in waiting arbitrarily long before creating
> a ".ready" file in some cases. Perhaps I am misunderstanding your
> suggestion.
>
> Another thing I noticed is that any changes in this area could impact
> archive_timeout. If we reset the archive_timeout timer when we mark
> the segments ready, we could force WAL switches more often. If we do
> not move the timer logic, we could be resetting it before the file is
> ready for the archiver. However, these differences might be subtle
> enough to be okay.

You're right. That doesn't seem to work. Another thing I had in my
mind was that XLogWrite had an additional flag so that
AdvanceXLInsertBuffer can tell not to mark .ready. The function is
called while it *is* writing a complete record. So even if
AdvanceXLInsertBuffer inhibit marking .ready the succeeding bytes
comes soon and we can mark the old segment as .ready at the time.

..
+ * If record_write == false, we don't mark the last segment as .ready
+ * if the caller requested to write up to segment boundary.
..
static void
- XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
+ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool record_write)

When XLogWrite is called with record_write = false, we don't mark
.ready and don't advance lastSegSwitchTime/LSN. At the next time
XLogWrite is called with record_write=true, if lastSegSwitchLSN is
behind the latest segment boundary before or equal to
LogwrtResult.Write, mark the skipped segments as .ready and update
lastSegSwitchTime/LSN.

Does the above make sense?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-12-24 02:10:49 Re: Condition variables vs interrupts
Previous Message Yugo Nagata 2019-12-24 02:01:48 Re: Implementing Incremental View Maintenance