Quick Links

Fwd: [BUG]: the walsender does not update its IO statistics until it exits

From:	Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>
Subject:	Fwd: [BUG]: the walsender does not update its IO statistics until it exits
Date:	2025-03-13 11:33:24
Message-ID:	CABPTF7VreDnD3YiWzx_=PpLRdgOQsH8Xp3fTGMc+r9rGpc3WLg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Forgot to cc...

---------- Forwarded message ---------
发件人： Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Date: 2025年3月13日周四 19:15
Subject: Re: [BUG]: the walsender does not update its IO statistics until
it exits
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>

Hi,

Thanks for working on this! I'm glad to see that the patch (
https://www.postgresql.org/message-id/flat/Z3zqc4o09dM/Ezyz(at)ip-10-97-1-34(dot)eu-west-3(dot)compute(dot)internal)
has been committed.

Regarding patch 0001, the optimization in pgstat_backend_have_pending_cb
looks good:

bool
pgstat_backend_have_pending_cb(void)
{
- return (!pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)));
+ return backend_has_iostats;
}

Additionally, the function pgstat_flush_backend includes the check:

+ if (!pgstat_backend_have_pending_cb())
return false;

However, I think we might need to revise the comment (and possibly the
function name) for clarity:

/*
* Check if there are any backend stats waiting to be flushed.
*/

Originally, this function was intended to check multiple types of backend
statistics, which made sense when PendingBackendStats was the centralized
structure for various pending backend stats. However, since
PgStat_PendingWalStats was removed from PendingBackendStats earlier, and
now this patch introduces the backend_has_iostats variable, the scope of
this function appears even narrower. This narrowed functionality no longer
aligns closely with the original function name and its associated comment.

Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> 于2025年3月3日周一 19:54写道：

> Hi,
>
> On Mon, Mar 03, 2025 at 10:51:19AM +0900, Michael Paquier wrote:
> > On Fri, Feb 28, 2025 at 10:39:31AM +0000, Bertrand Drouvot wrote:
> > > That sounds a good idea to measure the impact of those extra calls and
> see
> > > if we'd need to mitigate the impacts. I'll collect some data.
>
> So I did some tests using only one walsender (given the fact that the extra
> lock you mentioned above is "only" for this particular backend).
>
> === Test with pg_receivewal
>
> I was using one pg_receivewal process and did some tests that way:
>
> pgbench -n -c8 -j8 -T60 -f <(echo "SELECT pg_logical_emit_message(true,
> 'test', repeat('0', 1));";)
>
> I did not measure any noticeable extra lag (I did measure the time it took
> for pg_size_pretty(sent_lsn - write_lsn) from pg_stat_replication to be
> back
> to zero).
>
> During the pgbench run a "perf record --call-graph fp -p <walsender_pid>"
> would
> report (perf report -n):
>
> 1. pgstat_flush_backend() appears at about 3%
> 2. pg_memory_is_all_zeros() at about 2.8%
> 3. pgstat_flush_io() at about 0.4%
>
> So it does not look like what we're adding here can be seen as a primary
> bottleneck.
>
> That said it looks like that there is room for improvment in
> pgstat_flush_backend()
> and that relying on a "have_iostats" like variable would be better than
> those
> pg_memory_is_all_zeros() calls.
>
> That's done in 0001 attached, by doing so, pgstat_flush_backend() now
> appears at
> about 0.2%.
>
> === Test with pg_recvlogical
>
> Now it does not look like pg_receivewal had a lot of IO stats to report
> (looking at
> pg_stat_get_backend_io() output for the walsender).
>
> Doing the same test with "pg_recvlogical -d postgres -S logical_slot -f
> /dev/null --start"
> reports much more IO stats.
>
> What I observe without the "have_iostats" optimization is:
>
> 1. I did not measure any noticeable extra lag
> 2. pgstat_flush_io() at about 5.5% (pgstat_io_flush_cb() at about 5.3%)
> 3 pgstat_flush_backend() at about 4.8%
>
> and with the "have_iostats" optimization I now see pgstat_flush_backend()
> at
> about 2.51%.
>
> So it does not look like what we're adding here can be seen as a primary
> bottleneck
> but that is probably worth implementing the "have_iostats" optimization
> attached.
>
> Also, while I did not measure any noticeable extra lag, given the fact
> that
> pgstat_flush_io() shows at about 5.5% and pgstat_flush_backend() at about
> 2.5%,
> that could still make sense to reduce the frequency of the flush calls,
> thoughts?
>
> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>

In response to

Re: [BUG]: the walsender does not update its IO statistics until it exits at 2025-03-03 11:54:39 from Bertrand Drouvot

Responses

Re: Fwd: [BUG]: the walsender does not update its IO statistics until it exits at 2025-03-13 13:18:45 from Bertrand Drouvot

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Xuneng Zhou	2025-03-13 11:35:24	Fwd: [BUG]: the walsender does not update its IO statistics until it exits
Previous Message	Tomas Vondra	2025-03-13 11:03:29	Re: Changing the state of data checksums in a running cluster