Re: Fwd: [BUG]: the walsender does not update its IO statistics until it exits

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Fwd: [BUG]: the walsender does not update its IO statistics until it exits
Date: 2025-04-03 03:55:02
Message-ID: CALDaNm0bwzKVd4z9BCmYdEXFzB3_FNc6zSqg9Mf5xOnNf+6QOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 31 Mar 2025 at 18:35, Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Mon, Mar 24, 2025 at 08:41:20AM +0900, Michael Paquier wrote:
> > On Wed, Mar 19, 2025 at 04:00:49PM +0800, Xuneng Zhou wrote:
> > > Hi,
> > > Moving the other two provides a more complete view of the settings. For
> > > newcomers(like me) to the codebase, seeing all three related values in one
> > > place helps avoid a narrow view of the settings.
> > >
> > > But I am not sure that I understand the cons of this well.
> >
> > While I don't disagree with the use of a hardcoded interval of time to
> > control timing the flush of the WAL sender stats, do we really need to
> > rely on the timing defined by pgstat.c?
>
> No but I thought it could make sense.
>
> > Wouldn't it be simpler to
> > assign one in walsender.c and pick up a different, perhaps higher,
> > value?
>
> I don't have a strong opinion on it so done as suggested above in the attached.
>
> I think that the 1s value is fine because: 1. it is consistent with
> PGSTAT_MIN_INTERVAL and 2. it already needs that the sender is caught up or
> has pending data to send (means it could be higher than 1s already). That said,
> I don't think that would hurt if you think of a higher value.
>
> > At the end the timestamp calculations are free because we can rely on
> > the existing call of GetCurrentTimestamp() for the physical WAL
> > senders to get an idea of the current time.
>
> Yup
>
> > For the logical WAL
> > senders, perhaps we'd better do the reports in WalSndWaitForWal(),
> > actually. There is also a call to GetCurrentTimestamp() that we can
> > rely on in this path.
>
> I think it's better to flush the stats in a shared code path. I think it's
> easier to maintain and that there is no differences between logical and
> physical walsenders that would justify to flush the stats in specific code
> paths.

Couple of suggestions:
1) I felt we can include a similar verification for one of the logical
replication tests too:
+# Wait for the walsender to update its IO statistics.

+# Has to be done before the next restart and far enough from the
+# pg_stat_reset_shared('io') to minimize the risk of polling for too long.
+$node_primary->poll_query_until(
+ 'postgres',
+ qq[SELECT sum(reads) > 0
+ FROM pg_catalog.pg_stat_io
+ WHERE backend_type = 'walsender'
+ AND object = 'wal']
+ )
+ or die
+ "Timed out while waiting for the walsender to update its IO statistics";
+

2) We can comment this in a single line itself:
+ /*
+ * Report IO statistics
+ */

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2025-04-03 03:59:10 Re: Test to dump and restore objects left behind by regression
Previous Message Peter Smith 2025-04-03 03:37:10 Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.