From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Fwd: [BUG]: the walsender does not update its IO statistics until it exits |
Date: | 2025-03-13 11:33:24 |
Message-ID: | CABPTF7VreDnD3YiWzx_=PpLRdgOQsH8Xp3fTGMc+r9rGpc3WLg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Forgot to cc...
---------- Forwarded message ---------
发件人: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Date: 2025年3月13日周四 19:15
Subject: Re: [BUG]: the walsender does not update its IO statistics until
it exits
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Hi,
Thanks for working on this! I'm glad to see that the patch (
https://www.postgresql.org/message-id/flat/Z3zqc4o09dM/Ezyz(at)ip-10-97-1-34(dot)eu-west-3(dot)compute(dot)internal)
has been committed.
Regarding patch 0001, the optimization in pgstat_backend_have_pending_cb
looks good:
bool
pgstat_backend_have_pending_cb(void)
{
- return (!pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)));
+ return backend_has_iostats;
}
Additionally, the function pgstat_flush_backend includes the check:
+ if (!pgstat_backend_have_pending_cb())
return false;
However, I think we might need to revise the comment (and possibly the
function name) for clarity:
/*
* Check if there are any backend stats waiting to be flushed.
*/
Originally, this function was intended to check multiple types of backend
statistics, which made sense when PendingBackendStats was the centralized
structure for various pending backend stats. However, since
PgStat_PendingWalStats was removed from PendingBackendStats earlier, and
now this patch introduces the backend_has_iostats variable, the scope of
this function appears even narrower. This narrowed functionality no longer
aligns closely with the original function name and its associated comment.
Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> 于2025年3月3日周一 19:54写道:
> Hi,
>
> On Mon, Mar 03, 2025 at 10:51:19AM +0900, Michael Paquier wrote:
> > On Fri, Feb 28, 2025 at 10:39:31AM +0000, Bertrand Drouvot wrote:
> > > That sounds a good idea to measure the impact of those extra calls and
> see
> > > if we'd need to mitigate the impacts. I'll collect some data.
>
> So I did some tests using only one walsender (given the fact that the extra
> lock you mentioned above is "only" for this particular backend).
>
> === Test with pg_receivewal
>
> I was using one pg_receivewal process and did some tests that way:
>
> pgbench -n -c8 -j8 -T60 -f <(echo "SELECT pg_logical_emit_message(true,
> 'test', repeat('0', 1));";)
>
> I did not measure any noticeable extra lag (I did measure the time it took
> for pg_size_pretty(sent_lsn - write_lsn) from pg_stat_replication to be
> back
> to zero).
>
> During the pgbench run a "perf record --call-graph fp -p <walsender_pid>"
> would
> report (perf report -n):
>
> 1. pgstat_flush_backend() appears at about 3%
> 2. pg_memory_is_all_zeros() at about 2.8%
> 3. pgstat_flush_io() at about 0.4%
>
> So it does not look like what we're adding here can be seen as a primary
> bottleneck.
>
> That said it looks like that there is room for improvment in
> pgstat_flush_backend()
> and that relying on a "have_iostats" like variable would be better than
> those
> pg_memory_is_all_zeros() calls.
>
> That's done in 0001 attached, by doing so, pgstat_flush_backend() now
> appears at
> about 0.2%.
>
> === Test with pg_recvlogical
>
> Now it does not look like pg_receivewal had a lot of IO stats to report
> (looking at
> pg_stat_get_backend_io() output for the walsender).
>
> Doing the same test with "pg_recvlogical -d postgres -S logical_slot -f
> /dev/null --start"
> reports much more IO stats.
>
> What I observe without the "have_iostats" optimization is:
>
> 1. I did not measure any noticeable extra lag
> 2. pgstat_flush_io() at about 5.5% (pgstat_io_flush_cb() at about 5.3%)
> 3 pgstat_flush_backend() at about 4.8%
>
> and with the "have_iostats" optimization I now see pgstat_flush_backend()
> at
> about 2.51%.
>
> So it does not look like what we're adding here can be seen as a primary
> bottleneck
> but that is probably worth implementing the "have_iostats" optimization
> attached.
>
> Also, while I did not measure any noticeable extra lag, given the fact
> that
> pgstat_flush_io() shows at about 5.5% and pgstat_flush_backend() at about
> 2.5%,
> that could still make sense to reduce the frequency of the flush calls,
> thoughts?
>
> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>
From | Date | Subject | |
---|---|---|---|
Next Message | Xuneng Zhou | 2025-03-13 11:35:24 | Fwd: [BUG]: the walsender does not update its IO statistics until it exits |
Previous Message | Tomas Vondra | 2025-03-13 11:03:29 | Re: Changing the state of data checksums in a running cluster |