Re: per backend WAL statistics

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject: Re: per backend WAL statistics
Date: 2025-03-03 09:17:30
Message-ID: Z8VzqpBkj6ja5+EN@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Mar 03, 2025 at 10:48:23AM +0900, Michael Paquier wrote:
> On Fri, Feb 28, 2025 at 09:26:08AM +0000, Bertrand Drouvot wrote:
> > Also attaching the patch I mentioned up-thread to address some of Rahila's
> > comments ([1]): It adds a AuxiliaryPidGetProc() call in pgstat_fetch_stat_backend_by_pid()
> > and pg_stat_reset_backend_stats(). I think that fully makes sense since a051e71e28a
> > modified pgstat_tracks_backend_bktype() for B_WAL_RECEIVER, B_WAL_SUMMARIZER
> > and B_WAL_WRITER.
>
> Okay by me as it makes the code automatically more flexible if
> pgstat_tracks_backend_bktype() gets tweaked, including the call of
> pgstat_flush_backend() in pgstat_report_wal() so as the WAL writer is
> able to report backend stats for its WAL I/O. Applied this part as of
> 3f1db99bfabb.

Thanks!

> Something that's still not quite right is that the WAL receiver and
> the WAL summarizer do not call pgstat_report_wal() at all, so we don't
> report much data and we expect these processes to run continuously.
> The location where to report stats for the WAL summarizer is simple,
> even if the system is aggressive with WAL this is never called more
> than a couple of times per seconds, like the WAL writer:
>
> @@ -1541,6 +1542,10 @@ summarizer_read_local_xlog_page(XLogReaderState *state,
> * so we don't tight-loop.
> */
> HandleWalSummarizerInterrupts();
> +
> + /* report pending statistics to the cumulative stats system */
> + pgstat_report_wal(false);
> +
> summarizer_wait_for_wal();
>
> At this location, the WAL summarizer would wait as there is no data to
> read. The hot path is when we're reading a block.

Did not look closely enough but that sounds right after a quick look.

> The WAL receiver is a different story, because the WaitLatchOrSocket()
> call in the main loop of WalReceiverMain() is *very* aggressive, and
> it's easy to reach this code dozens of times each millisecond. In
> short, we need to be careful, I think, based on how this is currently
> written. My choice is then this path:
> --- a/src/backend/replication/walreceiver.c
> +++ b/src/backend/replication/walreceiver.c
> @@ -583,6 +583,10 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
> */
> bool requestReply = false;
>
> + /* report pending statistics to the cumulative stats system */
> + pgstat_report_wal(false);
> +
> /*
> * Check if time since last receive from prim
>
> This would update the stats only when the WAL receiver has nothing to
> do or if wal_receiver_status_interval is reached, so we're not going
> to storm pgstats with updates, still we get some data on a periodic
> basis *because* wal_receiver_status_interval would make sure that the
> path is taken even if we're under a lot of WAL pressure when sending
> feedback messages back to the WAL sender. Of course this needs a
> pretty good comment explaining the choice of this location. What do
> you think?

Same as above, that sounds right after a quick look.

> > It looks like it does not need doc updates. Attached as 0002 as it's somehow
> > un-related to this thread (but not sure it deserves it's dedicated thread though).
>
> I'm wondering if we should not lift more the list of processes listed
> in pgstat_tracks_backend_bktype() and include B_AUTOVAC_LAUNCHER,
> B_STARTUP, B_CHECKPOINTER, B_BG_WRITER at this stage, removing the
> entire paragraph. Not sure if we really have to do that for this
> release, but we could look at that separately.

hm, do you mean update the comment on top of pgstat_tracks_backend_bktype() or
update the documentation?

> With 3f1db99bfabb in place, wouldn't it be simpler to update
> pgstat_report_wal() in v12-0001 so as we use PGSTAT_BACKEND_FLUSH_ALL
> with one call of pgstat_flush_backend()? This saves one call for each
> stats flush.

hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents things
that need to be called from pgstat_report_wal(). But I think that's open
door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX is not
related to pgstat_report_wal() at all. So, I'm tempted to keep it as it is.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-03-03 09:23:41 Re: Selectively invalidate caches in pgoutput module
Previous Message Suraj Kharage 2025-03-03 09:13:40 Re: Support for NO INHERIT to INHERIT state change with named NOT NULL constraints