From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Sami Imseih <samimseih(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Restart pg_usleep when interrupted |
Date: | 2024-07-09 10:44:05 |
Message-ID: | Zo0UdeE3i9d0Wt5E@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Fri, Jul 05, 2024 at 11:49:45AM -0500, Sami Imseih wrote:
>
> > With 50 indexes and 10 parallel workers I can see things like:
> >
> > 2024-07-02 08:22:23.789 UTC [2189616] LOG: expected 1.000000, actual 239.378368
> > 2024-07-02 08:22:24.575 UTC [2189616] LOG: expected 0.100000, actual 224.331737
> > 2024-07-02 08:22:25.363 UTC [2189616] LOG: expected 1.300000, actual 230.462793
> > 2024-07-02 08:22:26.154 UTC [2189616] LOG: expected 1.000000, actual 225.980803
> >
> > Means we waited more than the max allowed cost delay (100ms).
> >
> > With 49 parallel workers, it's worst as I can see things like:
> >
> > 2024-07-02 08:26:36.069 UTC [2189807] LOG: expected 1.000000, actual 1106.790136
> > 2024-07-02 08:26:36.298 UTC [2189807] LOG: expected 1.000000, actual 218.148985
> >
> > The first actual wait time is about 1 second (it has been interrupted about
> > 16300 times during this second).
> >
> > To avoid this drift, the nanosleep() man page suggests to use clock_nanosleep()
> > with an absolute time value, that might be another idea to explore.
> >
> > [1]: https://www.postgresql.org/message-id/flat/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
> >
>
>
> A more portable approach which could be to continue using nanosleep and
> add checks to ensure that nanosleep exists whenever
> it goes past an absolute time. This was suggested by Bertrand in an offline
> conversation. I am not yet fully convinced of this idea, but posting the patch
> that implements this idea for anyone interested in looking.
Thanks!
I did a few tests with the patch and did not see any "large" drifts like the
ones observed above.
As far the patch, not thoroughly review (as it's still one option among others
being discussed)):
+ struct timespec current;
+ float time_diff;
+
+ clock_gettime(PG_INSTR_CLOCK, ¤t);
+
+ time_diff = (absolute.tv_sec - current.tv_sec) + (absolute.tv_nsec - current.tv_nsec) / 1000000000.0;
I think it could be "simplified" by making use of instr_time instead of timespec
for current and absolute. Then I think it would be enough to compare their
ticks.
> Since sub-millisecond sleep times are not guaranteed as suggested by
> the vacuum_cost_delay docs ( see below ), an alternative idea
> is to use clock_nanosleep for vacuum delay when it’s available, else
> fallback to WaitLatch.
Wouldn't that increase even more the cases where sub-millisecond won't be
guaranteed?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Melih Mutlu | 2024-07-09 10:56:32 | Re: Parent/child context relation in pg_get_backend_memory_contexts() |
Previous Message | Andrew Dunstan | 2024-07-09 10:26:12 | Re: tests fail on windows with default git settings |