From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Monitoring gaps in XLogWalRcvWrite() for the WAL receiver |
Date: | 2025-03-05 08:04:44 |
Message-ID: | Z8gFnH4o3jBm5BRz@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Wed, Mar 05, 2025 at 12:35:26PM +0900, Michael Paquier wrote:
> Hi all,
>
> While doing some monitoring of a replication setup for a stable
> branch, I have been surprised by the fact that we have never tracked
> WAL statistics for the WAL receiver in pg_stat_wal because we have
> never bothered to update its code so as WAL stats are reported.
Nice catch!
> This
> is relevant for the write and sync counts and timings.
Also for sync? sync looks fine as issue_xlog_fsync() is being called in
XLogWalRcvFlush(), no?
> As of f4694e0f35b2, the situation is better thanks to the addition of
> a pgstat_report_wal() in the WAL receiver main loop, so we have some
> data. However, we are only able to gather the data for segment syncs
> and initializations, not the writes themselves as these are managed by
> an independent code path, XLogWalRcvWrite().
>
> A second thing that lacks in XLogWalRcvWrite() is a wait event around
> the pg_pwrite() call, which is useful as the WAL receiver is listed in
> pg_stat_activity. Note that it is possible to re-use the same wait
> event as XLogWrite() for the WAL receiver, WAL_WRITE, because the WAL
> receiver does not rely on the write and flush calls from xlog.c when
> doing its work, and both have the same meaning, aka they write WAL.
> The fsync calls use issue_xlog_fsync() and the segment inits happen in
> XLogFileInit().
>
> Perhaps there's a point in backpatching a portion of what's in the
> attached patch (the wait event?), but I am not planning to bother much
> with the stable branches based on the lack of complaints.
We're not emitting some statistics, so I think that it's hard for users to
complain about something they don't/can't see.
> If you
> have an opinion about that, please feel free.
I'm tempted to say that the wal receiver part of f4694e0f35b2 should be
backpatched as well as what you're doing here.
+ /*
+ * Measure I/O timing to write WAL data, for pg_stat_io.
+ */
+ start = pgstat_prepare_io_time(track_wal_io_timing);
+
+ pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
byteswritten = pg_pwrite(recvFile, buf, segbytes, (off_t) startoff);
+ pgstat_report_wait_end();
Same logic as in XLogWrite() and I don't think there is a need for a
dedicated wait event, so LGTM.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Corey Huinker | 2025-03-05 08:08:55 | Re: Statistics Import and Export |
Previous Message | Bertrand Drouvot | 2025-03-05 07:34:16 | Re: Add regression test checking combinations of (object,backend_type,context) in pg_stat_io |