Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
Date: 2023-03-13 03:11:22
Message-ID: CA+hUKGL6Lx0sS5DY8Acf_EzRKKyGtYZS2TYJDu7-FTn-hoZkDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 11, 2023 at 11:49 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > I think this is the minimal back-patchable change. I propose to go
> > ahead and do that, and then to kick the ideas about latch API changes
> > into a new thread for the next commitfest.
>
> OK by me, but then again 4753ef37 wasn't my patch.

I'll wait another day to see if Stephen or anyone else who hasn't hit
Monday yet wants to object.

Here also are those other minor tweaks, for master only. I see now
that nanosleep() has already been proposed before:

https://www.postgresql.org/message-id/flat/CABQrizfxpBLZT5mZeE0js5oCh1tqEWvcGF3vMRCv5P-RwUY5dQ%40mail.gmail.com
https://www.postgresql.org/message-id/flat/4902.1552349020%40sss.pgh.pa.us

There I see the question of whether it should loop on EINTR to keep
waiting the remaining time. Generally it seems like a job for
something higher level to deal with interruption policy, and of course
all the race condition and portability problems inherent with signals
are fixed by using latches instead, so I don't think there really is a
good answer to that question -- if you loop, you break our programming
rules by wilfully ignoring eg global barriers, but if you don't loop,
it implies you're relying on the interrupt to cause you to do
something and yet you might have missed it if it was delivered just
before the syscall. At the time of the earlier thread, maybe it was
more acceptable as it could only delay cancel for that backend, but
now it might even delay arbitrary other backends, and neither answer
to that question can fix that in a race-free way. Also, back then
latches had a SIGUSR1 handler on common systems, but now they don't,
so (racy unreliable) latch responsiveness has decreased since then.
So I think we should just leave the interface as it is, and build
better things and then eventually retire it. This general topic is
also currently being discussed at:

https://www.postgresql.org/message-id/flat/20230209205929.GA720594%40nathanxps13

I propose to go ahead and make this small improvement anyway because
it'll surely be a while before we delete the last pg_usleep() call,
and it's good to spring-clean old confusing commentary about signals
and portability.

Attachment Content-Type Size
0001-Fix-fractional-vacuum_cost_delay.patch text/x-patch 2.4 KB
0002-Update-obsolete-comment-about-pg_usleep-accuracy.patch text/x-patch 1.5 KB
0003-Use-nanosleep-to-implement-pg_usleep.patch text/x-patch 2.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2023-03-13 03:59:44 Re: psql \watch 2nd argument: iteration count
Previous Message wangw.fnst@fujitsu.com 2023-03-13 02:47:21 RE: Rework LogicalOutputPluginWriterUpdateProgress