Re: About to add WAL write/fsync statistics to pg_stat_wal view

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Li Japin <japinli(at)hotmail(dot)com>, kuroda(dot)hayato(at)fujitsu(dot)com
Subject: Re: About to add WAL write/fsync statistics to pg_stat_wal view
Date: 2021-02-24 07:14:05
Message-ID: a7c255be-ca6e-b31f-b836-74eccf9510a8@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021/02/15 11:59, Masahiro Ikeda wrote:
> On 2021-02-10 00:51, David G. Johnston wrote:
>> On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
>> <ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
>>
>>> I pgindented the patches.
>>
>> ... <function>XLogWrite</function>, which is invoked during an
>> <function>XLogFlush</function> request (see ...).  This is also
>> incremented by the WAL receiver during replication.
>>
>> ("which normally called" should be "which is normally called" or
>> "which normally is called" if you want to keep true to the original)
>> You missed the adding the space before an opening parenthesis here and
>> elsewhere (probably copy-paste)
>>
>> is ether -> is either
>> "This parameter is off by default as it will repeatedly query the
>> operating system..."
>> ", because" -> "as"
>
> Thanks, I fixed them.
>
>> wal_write_time and the sync items also need the note: "This is also
>> incremented by the WAL receiver during replication."
>
> I skipped changing it since I separated the stats for the WAL receiver
> in pg_stat_wal_receiver.
>
>> "The number of times it happened..." -> " (the tally of this event is
>> reported in wal_buffers_full in....) This is undesirable because ..."
>
> Thanks, I fixed it.
>
>> I notice that the patch for WAL receiver doesn't require explicitly
>> computing the sync statistics but does require computing the write
>> statistics.  This is because of the presence of issue_xlog_fsync but
>> absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
>> the XLogWrite code path calls pgstat_report_wait_*() while the WAL
>> receiver path does not.  It seems technically straight-forward to
>> refactor here to avoid the almost-duplicated logic in the two places,
>> though I suspect there may be a trade-off for not adding another
>> function call to the stack given the importance of WAL processing
>> (though that seems marginalized compared to the cost of actually
>> writing the WAL).  Or, as Fujii noted, go the other way and don't have
>> any shared code between the two but instead implement the WAL receiver
>> one to use pg_stat_wal_receiver instead.  In either case, this
>> half-and-half implementation seems undesirable.
>
> OK, as Fujii-san mentioned, I separated the WAL receiver stats.
> (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)

Thanks for updating the patches!

> I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and
> the stats for WAL receiver is counted in pg_stat_wal_receiver.
> What do you think?

On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-02-24 07:16:34 Re: Single transaction in the tablesync worker?
Previous Message Kyotaro Horiguchi 2021-02-24 07:09:48 Re: Is Recovery actually paused?