Re: Resetting spilled txn statistics in pg_stat_replication

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Resetting spilled txn statistics in pg_stat_replication
Date: 2020-06-20 21:48:36
Message-ID: 20200620214836.7ncmxorvdkmvzepb@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Sorry for neglecting this thread for the last couple days ...

In general, I agree it's somewhat unfortunate the stats are reset when
the walsender exits. This was mostly fine for tuning of the spilling
(change value -> restart -> see stats) but for proper monitoring this
is somewhat problematic. I simply considered these fields somewhat
similar to lag monitoring, not from the "monitoring" POV.

On Thu, Jun 11, 2020 at 11:09:00PM +0900, Masahiko Sawada wrote:
>
> ...
>
>Since the logical decoding intermediate files are written at per slots
>directory, I thought that corresponding these statistics to
>replication slots is also understandable for users. I was thinking
>something like pg_stat_logical_replication_slot view which shows
>slot_name and statistics of only logical replication slots. The view
>always shows rows as many as existing replication slots regardless of
>logical decoding being running. I think there is no big difference in
>how users use these statistics values between maintaining at slot
>level and at logical decoding level.
>
>In logical replication case, since we generally don’t support setting
>different logical_decoding_work_mem per wal senders, every wal sender
>will decode the same WAL stream with the same setting, meaning they
>will similarly spill intermediate files. Maybe the same is true
>statistics of streaming. So having these statistics per logical
>replication might not help as of now.
>

I think the idea to track these stats per replication slot (rather than
per walsender) is the right approach. We should extend statistics
collector to keep one entry per replication slot and have a new stats
view called e.g. pg_stat_replication_slots, which could be reset just
like other stats in the collector.

I don't quite understand the discussion about different backends using
logical_decoding_work_mem - why would this be an issue? Surely we have
this exact issue e.g. with tracking index vs. sequential scans and GUCs
like random_page_cost. That can change over time too, different backends
may use different values, and yet we don't worry about resetting the
number of index scans for a table etc.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-06-20 21:57:23 Re: Resetting spilled txn statistics in pg_stat_replication
Previous Message Alexander Korotkov 2020-06-20 21:39:54 Re: Operator class parameters and sgml docs