Re: Replication slot stats misgivings

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: Replication slot stats misgivings
Date: 2021-04-19 03:29:23
Message-ID: CAD21AoDK5Rq1dVEVqW-x+XWU6=ji7RN063tMYKACZXAp7QoZBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 16, 2021 at 2:58 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Apr 15, 2021 at 4:35 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Thank you for the update! The patch looks good to me.
> >
>
> I have pushed the first patch. Comments on the next patch
> v13-0001-Use-HTAB-for-replication-slot-statistics:
> 1.
> + /*
> + * Check for all replication slots in stats hash table. We do this check
> + * when replSlotStats has more than max_replication_slots entries, i.e,
> + * when there are stats for the already-dropped slot, to avoid frequent
> + * call SearchNamedReplicationSlot() which acquires LWLock.
> + */
> + if (replSlotStats && hash_get_num_entries(replSlotStats) >
> max_replication_slots)
> + {
> + PgStat_ReplSlotEntry *slotentry;
> +
> + hash_seq_init(&hstat, replSlotStats);
> + while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
> + {
> + if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
> + pgstat_report_replslot_drop(NameStr(slotentry->slotname));
> + }
> + }
>
> Is SearchNamedReplicationSlot() so frequently used that we need to do
> this only when the hash table has entries more than
> max_replication_slots? I think it would be better if we can do it
> without such a condition to reduce the chances of missing the slot
> stats. We don't have any such restrictions for any other cases in this
> function.

Please see below comment on #4.

>
> I think it is better to add CHECK_FOR_INTERRUPTS in the above while loop?

Agreed.

>
> 2.
> /*
> * Replication slot statistics kept in the stats collector
> */
> -typedef struct PgStat_ReplSlotStats
> +typedef struct PgStat_ReplSlotEntry
>
> I think the comment above this structure can be changed to "The
> collector's data per slot" or something like that. Also, if we have to
> follow table/function/db style, then probably this structure should be
> named as PgStat_StatReplSlotEntry.

Agreed.

>
> 3.
> - * create the statistics for the replication slot.
> + * create the statistics for the replication slot. In case where the
> + * message for dropping the old slot gets lost and a slot with the same is
>
> /the same is/the same name is/.
>
> Can we mention something similar to what you have added here in docs as well?

Agreed.

>
> 4.
> +CREATE VIEW pg_stat_replication_slots AS
> + SELECT
> + s.slot_name,
> + s.spill_txns,
> + s.spill_count,
> + s.spill_bytes,
> + s.stream_txns,
> + s.stream_count,
> + s.stream_bytes,
> + s.total_txns,
> + s.total_bytes,
> + s.stats_reset
> + FROM pg_replication_slots as r,
> + LATERAL pg_stat_get_replication_slot(slot_name) as s
> + WHERE r.datoid IS NOT NULL; -- excluding physical slots
> ..
> ..
>
> -/* Get the statistics for the replication slots */
> +/* Get the statistics for the replication slot */
> Datum
> -pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
> +pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
> {
> #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
> - ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> + text *slotname_text = PG_GETARG_TEXT_P(0);
> + NameData slotname;
>
> I think with the above changes getting all the slot stats has become
> much costlier. Is there any reason why can't we get all the stats from
> the new hash_table in one shot and return them to the user?

I think the advantage of this approach would be that it can avoid
showing the stats for already-dropped slots. Like other statistics
views such as pg_stat_all_tables and pg_stat_all_functions, searching
the stats by the name got from pg_replication_slots can show only
available slot stats even if the hash table has garbage slot stats.
Given that pg_stat_replication_slots doesn’t show garbage slot stats
even if it has, I thought we can avoid checking those garbage stats
frequently. It should not essentially be a problem for the hash table
to have entries up to max_replication_slots regardless of live or
already-dropped.

As another design, we can get all stats from the hash table in one
shot as you suggested. If we do that, it's better to check garbage
slot stats every time pgstat_vacuum_stat() is called so the view
doesn't show those stats but cannot avoid that completely.

I'll change the code pointed out by #1 and #4 according to this design
discussion.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2021-04-19 03:58:06 Re: Remove redundant variable from transformCreateStmt
Previous Message Michael Paquier 2021-04-19 02:42:58 Re: doc review for v14