Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

From: Andres Freund <andres(at)anarazel(dot)de>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing
Date: 2025-02-24 11:15:53
Message-ID: xiwwevyjw336dgpqpqwvcuzh7riurucjvdaywsysbmfq5qfmzv@w2jfhutsvs66
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-02-24 10:54:54 +0000, Bertrand Drouvot wrote:
> a051e71e28a added the "timing tracking" in the WAL code path with "only"
> track_io_timing enabled (and track_wal_io_timing still false). Here, in this thread,
> we remove unnecessary INSTR_TIME_SET_CURRENT()/INSTR_TIME_ACCUM_DIFF() calls when
> both track_io_timing and track_wal_io_timing were enabled.
>
> So I think that your main concern comes from the fact that a051e71e28a added the
> "timing tracking" in the WAL code path with "only" track_io_timing enabled.

Correct.

> That's a concern we also had and discussed in [1]. The question was: should
> we track the WAL timing stats in pg_stat_io only when track_wal_io_timing is
> enabled or relying only on track_io_timing would be enough?
>
> We ran several benchmarks ([2], [3] and [4]) and based on the results we reached
> the conclusion that to rely only on track_io_timing to display the WAL timing
> stats in pg_stat_io was "good" enough.

Sorry to be blunt, but those benchmarks miss the mark. The benchmark

- emits WAL in a transactional fashion, with fairly small transactions, where
the disk is fairly slow. It's absolutely guaranteed that the bottleneck is
just the WAL flush disk time. You're doing ~1k-2k TPS - the timing overhead
would have to be *ginormous* to show up.

- emits WAL at a low concurrency thus lock contention not really an issue.

- likely is executed so on a system with very cheap timing functions - but
often production workloads unfortunately don't. While it's not quite as
common as it used to be, virtualized systems often have considerably slower
clocks.

On this system it'll likely be fine overhead-wise:

$ echo tsc | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

$ src/bin/pg_test_timing/pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 19.17 ns
Histogram of timing durations:
< us % of total count
1 98.08521 153484414
2 1.91421 2995376
4 0.00040 619
8 0.00014 225
16 0.00002 35
32 0.00001 9
64 0.00001 10
128 0.00000 3
256 0.00000 2
512 0.00000 1

On this system however, I'd not want to bet on it:

$ echo hpet | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource
$ src/bin/pg_test_timing/pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 1228.91 ns
Histogram of timing durations:
< us % of total count
1 0.15574 3802
2 78.51065 1916598
4 21.26778 519188
8 0.02069 505
16 0.03957 966
32 0.00356 87
64 0.00193 47
128 0.00008 2

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shlok Kyal 2025-02-24 11:18:50 Re: long-standing data loss bug in initial sync of logical replication
Previous Message Shlok Kyal 2025-02-24 11:07:35 Re: Restrict copying of invalidated replication slots