Re: .ready and .done files considered harmful

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Hannu Krosing <hannuk(at)google(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: .ready and .done files considered harmful
Date: 2021-08-19 21:12:46
Message-ID: 38E8A6F6-4D00-442B-B14A-26F7D3AA898E@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/19/21, 5:42 AM, "Dipesh Pandit" <dipesh(dot)pandit(at)gmail(dot)com> wrote:
>> Should we have XLogArchiveNotify(), writeTimeLineHistory(), and
>> writeTimeLineHistoryFile() enable the directory scan instead? Else,
>> we have to exhaustively cover all such code paths, which may be
>> difficult to maintain. Another reason I am bringing this up is that
>> my patch for adjusting .ready file creation [0] introduces more
>> opportunities for .ready files to be created out-of-order.
>
> XLogArchiveNotify() notifies Archiver when a log segment is ready for
> archival by creating a .ready file. This function is being called for each
> log segment and placing a call to enable directory scan here will result
> in directory scan for each log segment.

Could we have XLogArchiveNotify() check the archiver state and only
trigger a directory scan if we detect that we are creating an out-of-
order .ready file?

> There is one possible scenario where it may run into a race condition. If
> archiver has just finished archiving all .ready files and the next anticipated
> log segment is not available then in this case archiver takes the fall-back
> path to scan directory. It resets the flag before it begins directory scan.
> Now, if a directory scan is enabled by a timeline switch or .ready file created
> out of order in parallel to the event that the archiver resets the flag then this
> might result in a race condition. But in this case also archiver is eventually
> going to perform a directory scan and the desired file will be archived as part
> of directory scan. Apart of this I can't think of any other scenario which may
> result into a race condition unless I am missing something.

What do you think about adding an upper limit to the number of files
we can archive before doing a directory scan? The more I think about
the directory scan flag, the more I believe it is a best-effort tool
that will remain prone to race conditions. If we have a guarantee
that a directory scan will happen within the next N files, there's
probably less pressure to make sure that it's 100% correct.

On an unrelated note, do we need to add some extra handling for backup
history files and partial WAL files?

Nathan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Christensen 2021-08-19 22:09:25 [PATCH] Proof of concept for GUC improvements
Previous Message Hannu Krosing 2021-08-19 19:52:38 Re: Middleware Messages for FE/BE