Re: Restart pg_usleep when interrupted

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Sami Imseih <samimseih(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Restart pg_usleep when interrupted
Date: 2024-08-21 15:06:30
Message-ID: ZsYCdsRnidLOiCM5@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Aug 20, 2024 at 02:25:19PM -0500, Sami Imseih wrote:
> > As it looks like we have a consensus that reducing the number of interrupts also
> > makes sense, I just provided a rebase version of the 1 Hz version (see [0], that
> > also makes clear in the doc that the new field might show slightly old values).
>
> That makes sense. However, I suspect the "1 Hz" work code will no longer
> be needed if CF entry 5118 [1] mentioned by Thomas [2] a few days back
> goes through. Maybe this extra work can be removed if [1] goes through.
> What do you think?

Yeah agree that the "1 Hz" work wouldn't be needed anymore if the CF entry 5118
goes through *and* if vacuum delays keep doing pg_usleep() (as the leader won't
receive SIGUSR1 from the parallel workers while executing nanosleep() anymore).
It could still receive SIGHUP or such but that's outside of the PqMsg_Progress
case though.

> With regards to CF 5118 and what it means to the current discussion, below
> are my thoughts.
>
> I tested with both CF 5118 [1] and the cost-delay tracking patch. With that in place,
> pg_usleep is able to sleep the full requested, as mentioned by Thomas [3]. This is
> because certain interrupts like Parallel Message and others are not signaled
> by SIGUSR1 any longer, but with latches.
>

Yeah.

> From this discussion, there is desire for a sleep function that:
> 1/ Sleeps for the full duration of the requested time
> 2/ Continues to handle important interrupts during the sleep
>
> While something like CF 5118 will take care of point #1,

I'm not sure, even with CF entry 5118, nanosleep() could be interrupted. But I
agree that the leader won't be interrupted by PqMsg_Progress anymore.

> it will not deal
> with #2. Also, the v11 [4] patch does not do #2 either. So I think
> in the sleep loop, we need a C_F_I call. The same type of loop can
> also be used to call WaitForSingleObject.
>
> If CF 5118 gets committed, will still need similar loop that calls C_F_I,
> but the function will need to call WaitLatchUs [5].
>
> Thoughts?

If CF 5118 gets committed, then I think we would need to move to using WaitLatchUs()
to react to urgent request. We'd also need to find a way to ensure that we'd
wait for the full duration of the requested time depending of the reason why we
waked up (well, only if we agree that 1/ is needed and I'm not sure we got a
consensus).

So I think that:

1. we should still implement the "1 Hz" stuff as 1.1/ it could be useful if CF
5118 gets committed and we move to WaitLatchUs() and 2.2/ it won't break anything
if CF gets committed and we don't move to WaitLatchUs(). For 1.1 it would still
prevent the leader to be waked up too frequently by the parallel workers.

2. still discuss the "need" and get a consensus regarding a sleep that could
ensure the wait duration (in some cases and depending of the reason why the
process is waked up).

Thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2024-08-21 15:28:57 Re: type cache cleanup improvements
Previous Message Robert Haas 2024-08-21 15:03:47 Re: pg_combinebackup does not detect missing files