From: | "Bossart, Nathan" <bossartn(at)amazon(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Hannu Krosing <hannuk(at)google(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: .ready and .done files considered harmful |
Date: | 2021-09-20 22:49:09 |
Message-ID: | 65F427BD-6390-47E3-8F6C-2872BCFEE005@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 9/20/21, 1:42 PM, "Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> On 2021-Sep-20, Robert Haas wrote:
>
>> I was thinking that this might increase the number of directory scans
>> by a pretty large amount when we repeatedly catch up, then 1 new file
>> gets added, then we catch up, etc.
>
> I was going to say that perhaps we can avoid repeated scans by having a
> bitmap of future files that were found by a scan; so if we need to do
> one scan, we keep track of the presence of the next (say) 64 files in
> our timeline, and then we only have to do another scan when we need to
> archive a file that wasn't present the last time we scanned. However:
This sounds a bit like the other approach discussed earlier in this
thread [0].
>> But I guess your thought process is that such directory scans, even if
>> they happen many times per second, can't really be that expensive,
>> since the directory can't have much in it. Which seems like a fair
>> point. I wonder if there are any situations in which there's not much
>> to archive but the archive_status directory still contains tons of
>> files.
>
> (If we take this stance, which seems reasonable to me, then we don't
> need to optimize.) But perhaps we should complain if we find extraneous
> files in archive_status -- Then it'd be on the users' heads not to leave
> tons of files that would slow down the scan.
The simplest situation I can think of that might be a problem is when
checkpointing is stuck and the .done files are adding up. However,
after the lengthy directory scan, you should still be able to archive
several files without a scan of archive_status. And if you are
repeatedly catching up, the extra directory scans probably aren't
hurting anything. At the very least, this patch doesn't make things
any worse in this area.
BTW I attached a new version of the patch with a couple of small
changes. Specifically, I adjusted some of the comments and moved the
assignment of last_dir_scan to after the directory scan completes.
Before, we were resetting it before the directory scan, so if the
directory scan took too long, you'd still end up scanning
archive_status for every file. I think that's still possible if your
archive_command is especially slow, but archiving isn't going to keep
up anyway in that case.
Nathan
Attachment | Content-Type | Size |
---|---|---|
v7-0001-Do-fewer-directory-scans-of-archive_status.patch | application/octet-stream | 11.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David Fetter | 2021-09-20 23:04:01 | Re: psql: tab completion differs on semicolon placement |
Previous Message | Jaime Casanova | 2021-09-20 21:29:26 | Re: Parallel Full Hash Join |