Re: Track the amount of time waiting due to cost_delay

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Track the amount of time waiting due to cost_delay
Date: 2024-06-11 09:49:11
Message-ID: Zmgdl2EC3uiheEm8@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Jun 10, 2024 at 05:58:13PM -0400, Robert Haas wrote:
> I'm always suspicious of this sort of thing. I tend to find nothing
> gives me the right answer unless I assume that the actual sleep times
> are randomly and systematically different from the intended sleep
> times but arbitrarily large amounts. I think we should at least do
> some testing: if we measure both the intended sleep time and the
> actual sleep time, how close are they? Does it change if the system is
> under crushing load (which might elongate sleeps) or if we spam
> SIGUSR1 against the vacuum process (which might shorten them)?

Though I (now) think that it would make sense to record the actual delay time
instead (see [1]), I think it's interesting to do some testing as you suggested.

With record_actual_time.txt (attached) applied on top of v1, we can see the
intended and actual wait time.

On my system, "no load at all" except the vacuum running, I see no diff:

Tue Jun 11 09:22:06 2024 (every 1s)

pid | relid | phase | time_delayed | actual_time_delayed | duration
-------+-------+---------------+--------------+---------------------+-----------------
54754 | 16385 | scanning heap | 41107 | 41107 | 00:00:42.301851
(1 row)

Tue Jun 11 09:22:07 2024 (every 1s)

pid | relid | phase | time_delayed | actual_time_delayed | duration
-------+-------+---------------+--------------+---------------------+-----------------
54754 | 16385 | scanning heap | 42076 | 42076 | 00:00:43.301848
(1 row)

Tue Jun 11 09:22:08 2024 (every 1s)

pid | relid | phase | time_delayed | actual_time_delayed | duration
-------+-------+---------------+--------------+---------------------+-----------------
54754 | 16385 | scanning heap | 43045 | 43045 | 00:00:44.301854
(1 row)

But if I launch pg_reload_conf() 10 times in a row, I can see:

Tue Jun 11 09:22:09 2024 (every 1s)

pid | relid | phase | time_delayed | actual_time_delayed | duration
-------+-------+---------------+--------------+---------------------+-----------------
54754 | 16385 | scanning heap | 44064 | 44034 | 00:00:45.302965
(1 row)

Tue Jun 11 09:22:10 2024 (every 1s)

pid | relid | phase | time_delayed | actual_time_delayed | duration
-------+-------+---------------+--------------+---------------------+-----------------
54754 | 16385 | scanning heap | 45033 | 45003 | 00:00:46.301858

As we can see the actual wait time is 30ms less than the intended wait time with
this simple test. So I still think we should go with 1) actual wait time and 2)
report the number of waits (as mentioned in [1]). Does that make sense to you?

[1]: https://www.postgresql.org/message-id/Zmf712A5xcOM9Hlg%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
record_actual_time.txt text/plain 3.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2024-06-11 09:54:30 Re: Addressing SECURITY DEFINER Function Vulnerabilities in PostgreSQL Extensions
Previous Message Ashutosh Bapat 2024-06-11 09:42:30 Re: Conflict Detection and Resolution