Re: Track the amount of time waiting due to cost_delay

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Track the amount of time waiting due to cost_delay
Date: 2024-06-12 12:02:00
Message-ID: ZmmOOPwMFIltkdsN@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Jun 11, 2024 at 01:13:48PM -0400, Robert Haas wrote:
> On Tue, Jun 11, 2024 at 5:49 AM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > As we can see the actual wait time is 30ms less than the intended wait time with
> > this simple test. So I still think we should go with 1) actual wait time and 2)
> > report the number of waits (as mentioned in [1]). Does that make sense to you?
>
> I like the idea of reporting the actual wait time better,

+1

> provided
> that we verify that doing so isn't too expensive. I think it probably
> isn't, because in a long-running VACUUM there is likely to be disk
> I/O, so the CPU overhead of a few extra gettimeofday() calls should be
> fairly low by comparison.

Agree.

> I wonder if there's a noticeable hit when
> everything is in-memory. I guess probably not, because with any sort
> of normal configuration, we shouldn't be delaying after every block we
> process, so the cost of those gettimeofday() calls should still be
> getting spread across quite a bit of real work.

I did some testing, with:

shared_buffers = 12GB
vacuum_cost_delay = 1
autovacuum_vacuum_cost_delay = 1
max_parallel_maintenance_workers = 0
max_parallel_workers = 0

added to a default config file.

A table and all its indexes were fully in memory, the numbers are:

postgres=# SELECT n.nspname, c.relname, count(*) AS buffers
FROM pg_buffercache b JOIN pg_class c
ON b.relfilenode = pg_relation_filenode(c.oid) AND
b.reldatabase IN (0, (SELECT oid FROM pg_database
WHERE datname = current_database()))
JOIN pg_namespace n ON n.oid = c.relnamespace
GROUP BY n.nspname, c.relname
ORDER BY 3 DESC
LIMIT 11;

nspname | relname | buffers
---------+-------------------+---------
public | large_tbl | 222280
public | large_tbl_pkey | 5486
public | large_tbl_filler7 | 1859
public | large_tbl_filler4 | 1859
public | large_tbl_filler1 | 1859
public | large_tbl_filler6 | 1859
public | large_tbl_filler3 | 1859
public | large_tbl_filler2 | 1859
public | large_tbl_filler5 | 1859
public | large_tbl_filler8 | 1859
public | large_tbl_version | 1576
(11 rows)

The observed timings when vacuuming this table are:

On master:

vacuum phase: cumulative duration
---------------------------------

scanning heap: 00:00:37.808184
vacuuming indexes: 00:00:41.808176
vacuuming heap: 00:00:54.808156

On master patched with actual time delayed:

vacuum phase: cumulative duration
---------------------------------

scanning heap: 00:00:36.502104 (time_delayed: 22202)
vacuuming indexes: 00:00:41.002103 (time_delayed: 23769)
vacuuming heap: 00:00:54.302096 (time_delayed: 34886)

As we can see there is no noticeable degradation while the vacuum entered about
34886 times in this instrumentation code path (cost_delay was set to 1).

> That said, I'm not sure this experiment shows a real problem with the
> idea of showing intended wait time. It does establish the concept that
> repeated signals can throw our numbers off, but 30ms isn't much of a
> discrepancy.

Yeah, the idea was just to show how easy it is to create a 30ms discrepancy.

> I'm worried about being off by a factor of two, or an
> order of magnitude. I think we still don't know if that can happen,
> but if we're going to show actual wait time anyway, then we don't need
> to explore the problems with other hypothetical systems too much.

Agree.

> I'm not convinced that reporting the number of waits is useful. If we
> were going to report a possibly-inaccurate amount of actual waiting,
> then also reporting the number of waits might make it easier to figure
> out when the possibly-inaccurate number was in fact inaccurate. But I
> think it's way better to report an accurate amount of actual waiting,
> and then I'm not sure what we gain by also reporting the number of
> waits.

Sami shared his thoughts in [1] and [2] and so did I in [3]. If some of us still
don't think that reporting the number of waits is useful then we can probably
start without it.

[1]: https://www.postgresql.org/message-id/0EA474B6-BF88-49AE-82CA-C1A9A3C17727%40amazon.com
[2]: https://www.postgresql.org/message-id/E12435E2-5FCA-49B0-9ADB-0E7153F95E2D%40amazon.com
[3]: https://www.postgresql.org/message-id/ZmmGG4e%2BqTBD2kfn%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2024-06-12 12:09:21 Re: RFC: adding pytest as a supported test framework
Previous Message Alexander Korotkov 2024-06-12 12:00:54 Re: RFC: adding pytest as a supported test framework