From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> |
Subject: | Re: per backend WAL statistics |
Date: | 2025-03-03 09:17:30 |
Message-ID: | Z8VzqpBkj6ja5+EN@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Mon, Mar 03, 2025 at 10:48:23AM +0900, Michael Paquier wrote:
> On Fri, Feb 28, 2025 at 09:26:08AM +0000, Bertrand Drouvot wrote:
> > Also attaching the patch I mentioned up-thread to address some of Rahila's
> > comments ([1]): It adds a AuxiliaryPidGetProc() call in pgstat_fetch_stat_backend_by_pid()
> > and pg_stat_reset_backend_stats(). I think that fully makes sense since a051e71e28a
> > modified pgstat_tracks_backend_bktype() for B_WAL_RECEIVER, B_WAL_SUMMARIZER
> > and B_WAL_WRITER.
>
> Okay by me as it makes the code automatically more flexible if
> pgstat_tracks_backend_bktype() gets tweaked, including the call of
> pgstat_flush_backend() in pgstat_report_wal() so as the WAL writer is
> able to report backend stats for its WAL I/O. Applied this part as of
> 3f1db99bfabb.
Thanks!
> Something that's still not quite right is that the WAL receiver and
> the WAL summarizer do not call pgstat_report_wal() at all, so we don't
> report much data and we expect these processes to run continuously.
> The location where to report stats for the WAL summarizer is simple,
> even if the system is aggressive with WAL this is never called more
> than a couple of times per seconds, like the WAL writer:
>
> @@ -1541,6 +1542,10 @@ summarizer_read_local_xlog_page(XLogReaderState *state,
> * so we don't tight-loop.
> */
> HandleWalSummarizerInterrupts();
> +
> + /* report pending statistics to the cumulative stats system */
> + pgstat_report_wal(false);
> +
> summarizer_wait_for_wal();
>
> At this location, the WAL summarizer would wait as there is no data to
> read. The hot path is when we're reading a block.
Did not look closely enough but that sounds right after a quick look.
> The WAL receiver is a different story, because the WaitLatchOrSocket()
> call in the main loop of WalReceiverMain() is *very* aggressive, and
> it's easy to reach this code dozens of times each millisecond. In
> short, we need to be careful, I think, based on how this is currently
> written. My choice is then this path:
> --- a/src/backend/replication/walreceiver.c
> +++ b/src/backend/replication/walreceiver.c
> @@ -583,6 +583,10 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
> */
> bool requestReply = false;
>
> + /* report pending statistics to the cumulative stats system */
> + pgstat_report_wal(false);
> +
> /*
> * Check if time since last receive from prim
>
> This would update the stats only when the WAL receiver has nothing to
> do or if wal_receiver_status_interval is reached, so we're not going
> to storm pgstats with updates, still we get some data on a periodic
> basis *because* wal_receiver_status_interval would make sure that the
> path is taken even if we're under a lot of WAL pressure when sending
> feedback messages back to the WAL sender. Of course this needs a
> pretty good comment explaining the choice of this location. What do
> you think?
Same as above, that sounds right after a quick look.
> > It looks like it does not need doc updates. Attached as 0002 as it's somehow
> > un-related to this thread (but not sure it deserves it's dedicated thread though).
>
> I'm wondering if we should not lift more the list of processes listed
> in pgstat_tracks_backend_bktype() and include B_AUTOVAC_LAUNCHER,
> B_STARTUP, B_CHECKPOINTER, B_BG_WRITER at this stage, removing the
> entire paragraph. Not sure if we really have to do that for this
> release, but we could look at that separately.
hm, do you mean update the comment on top of pgstat_tracks_backend_bktype() or
update the documentation?
> With 3f1db99bfabb in place, wouldn't it be simpler to update
> pgstat_report_wal() in v12-0001 so as we use PGSTAT_BACKEND_FLUSH_ALL
> with one call of pgstat_flush_backend()? This saves one call for each
> stats flush.
hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents things
that need to be called from pgstat_report_wal(). But I think that's open
door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX is not
related to pgstat_report_wal() at all. So, I'm tempted to keep it as it is.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-03-03 09:23:41 | Re: Selectively invalidate caches in pgoutput module |
Previous Message | Suraj Kharage | 2025-03-03 09:13:40 | Re: Support for NO INHERIT to INHERIT state change with named NOT NULL constraints |