Re: Show WAL write and fsync stats in pg_stat_io

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "bharath(dot)rupireddyforpostgres(at)gmail(dot)com" <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Show WAL write and fsync stats in pg_stat_io
Date: 2025-02-04 07:55:03
Message-ID: Z6HH150-Aw6ilQYE@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 03, 2025 at 02:34:29PM +0300, Nazir Bilal Yavuz wrote:
> On Mon, 3 Feb 2025 at 11:50, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>> === 1
>>
>> + pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_INIT, IOOP_WRITE,
>> + io_start, 1, wal_segment_size);
>>
>> In case wal_init_zero is false, then we're only seeking to the end and write a
>> solitary byte. Then, is reporting "wal_segment_size" correct?
>
> I think you are right. It would make sense to have two
> pgstat_count_io_op_time() calls here. One for wal_segment_size and one
> for solitary byte.

Ah, right. We can just use one call with the size written set
depending on wal_init_zero, because this is still a IOOP_WRITE for a
IOCONTEXT_INIT in both cases. Only the size changes as we are in
XLogFileInitInternal().

>> + /*
>> + * Measure I/O timing to write WAL data, for pg_stat_wal
>> + * and/or pg_stat_io.
>> + */
>> + start = pgstat_prepare_io_time(track_wal_io_timing || track_io_timing);
>>
>> I think that makes sense done that way (as track_wal_io_timing does not have
>> any effect in pgstat_count_io_op_time()). Nit: maybe change the order in the
>> comment to reflect the code ordering? (I mean to say re-word to "for pg_stat_io
>> and/or pg_stat_wal). The order is ok in issue_xlog_fsync() though.

Sure. Fine by me. This makes things a bit more consistent across the
board.

>> === 3
>>
>> What about adding a message in the doc as mentioned in [1]? (I'd not be surprised
>> if some people wonder why the "bytes" fields differ).

Not sure about that. Perhaps you have something in mind?

>> === 4
>>
>> pgstat_tracks_io_object() starts to be hard to read. I wonder if it could be
>> simplified with switch but that could be done after this one goes in.

If you have a proposal, feel free. The current style is something I'm
used to, as well, so that does not bother me much..

At the end, we want this patch and this data, and my benchmarcking is
not showing much differences even if going through a workload with
many pages, so I've used the version relying entirely on
track_io_timing and applied it.

If we split these timings across more GUCs, one thing to consider
would be a third GUC which is neither track_wal_io_timing nor
track_io_timing to keep things independent, but I am not really
convinced that's necessary.

Now, for the rest..
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2025-02-04 07:59:43 Re: Unsafe access BufferDescriptors array in BufferGetLSNAtomic()
Previous Message jian he 2025-02-04 07:42:28 Re: Non-text mode for pg_dumpall