Re: Skipping logical replication transactions on subscriber side

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-09-05 13:41:20
Message-ID: CAD21AoCu+VoSLKqKmVhZToXn5+84k_skSZRAQRXRAN+ObuZfaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 2, 2021 at 12:06 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> >
> > I've attached rebased patches. 0004 patch is not the scope of this
> > patch. It's borrowed from another thread[1] to fix the assertion
> > failure for newly added tests. Please review them.
> >
>
> I have some initial feedback on the v12-0001 patch.
> Most of these are suggested improvements to wording, and some typo fixes.

Thank you for the comments!

>
>
> (0) Patch comment
>
> Suggestion to improve the patch comment:
>
> BEFORE:
> Add pg_stat_subscription_errors statistics view.
>
> This commits adds new system view pg_stat_logical_replication_error,

Oops, I realized that it should be pg_stat_subscription_errors.

> showing errors happening during applying logical replication changes
> as well as during performing initial table synchronization.
>
> The subscription error entries are removed by autovacuum workers when
> the table synchronization competed in table sync worker cases and when
> dropping the subscription in apply worker cases.
>
> It also adds SQL function pg_stat_reset_subscription_error() to
> reset the single subscription error.
>
> AFTER:
> Add a subscription errors statistics view "pg_stat_subscription_errors".
>
> This commits adds a new system view pg_stat_logical_replication_errors,
> that records information about any errors which occur during application
> of logical replication changes as well as during performing initial table
> synchronization.

I think that views don't have any data so "show information" seems
appropriate to me here. Thoughts?

>
> The subscription error entries are removed by autovacuum workers after
> table synchronization completes in table sync worker cases and after
> dropping the subscription in apply worker cases.
>
> It also adds an SQL function pg_stat_reset_subscription_error() to
> reset a single subscription error.
>
>
>
> doc/src/sgml/monitoring.sgml:
>
> (1)
> BEFORE:
> + <entry>One row per error that happened on subscription, showing
> information about
> + the subscription errors.
> AFTER:
> + <entry>One row per error that occurred on subscription,
> providing information about
> + each subscription error.

Fixed.

>
> (2)
> BEFORE:
> + The <structname>pg_stat_subscription_errors</structname> view will
> contain one
> AFTER:
> + The <structname>pg_stat_subscription_errors</structname> view contains one
>

I think that descriptions of other statistics view also say "XXX view
will contain ...".

>
> (3)
> BEFORE:
> + Name of the database in which the subscription is created.
> AFTER:
> + Name of the database in which the subscription was created.

Fixed.

>
> (4)
> BEFORE:
> + OID of the relation that the worker is processing when the
> + error happened.
> AFTER:
> + OID of the relation that the worker was processing when the
> + error occurred.
>

Fixed.

>
> (5)
> BEFORE:
> + Name of command being applied when the error happened. This
> + field is always NULL if the error is reported by
> + <literal>tablesync</literal> worker.
> AFTER:
> + Name of command being applied when the error occurred. This
> + field is always NULL if the error is reported by a
> + <literal>tablesync</literal> worker.

Fixed.

> (6)
> BEFORE:
> + Transaction ID of publisher node being applied when the error
> + happened. This field is always NULL if the error is reported
> + by <literal>tablesync</literal> worker.
> AFTER:
> + Transaction ID of the publisher node being applied when the error
> + happened. This field is always NULL if the error is reported
> + by a <literal>tablesync</literal> worker.

Fixed.

> (7)
> BEFORE:
> + Type of worker reported the error: <literal>apply</literal> or
> + <literal>tablesync</literal>.
> AFTER:
> + Type of worker reporting the error: <literal>apply</literal> or
> + <literal>tablesync</literal>.

Fixed.

>
> (8)
> BEFORE:
> + Number of times error happened on the worker.
> AFTER:
> + Number of times the error occurred in the worker.
>
> [or "Number of times the worker reported the error" ?]

I prefer "Number of times the error occurred in the worker."

>
> (9)
> BEFORE:
> + Time at which the last error happened.
> AFTER:
> + Time at which the last error occurred.

Fixed.

>
> (10)
> BEFORE:
> + Error message which is reported last failure time.
> AFTER:
> + Error message which was reported at the last failure time.
>
> Maybe this should just say "Last reported error message" ?

Fixed.

>
>
> (11)
> You shouldn't call hash_get_num_entries() on a NULL pointer.
>
> Suggest swappling lines, as shown below:
>
> BEFORE:
> + int32 nerrors = hash_get_num_entries(subent->suberrors);
> +
> + /* Skip this subscription if not have any errors */
> + if (subent->suberrors == NULL)
> + continue;
> AFTER:
> + int32 nerrors;
> +
> + /* Skip this subscription if not have any errors */
> + if (subent->suberrors == NULL)
> + continue;
> + nerrors = hash_get_num_entries(subent->suberrors);

Right. Fixed.

>
>
> (12)
> Typo: legnth -> length
>
> + * contains the fixed-legnth error message string which is

Fixed.

>
>
> src/backend/postmaster/pgstat.c
>
> (13)
> "Subscription stat entries" hashtable is created in two different
> places, one with HASH_CONTEXT and the other without. Is this
> intentional?
> Shouldn't there be a single function for creating this?

Yes, it's intentional. It's consistent with hash tables for other statistics.

>
>
> (14)
> + * PgStat_MsgSubscriptionPurge Sent by the autovacuum purge the subscriptions.
>
> Seems to be missing a word, is it meant to say "Sent by the autovacuum
> to purge the subscriptions." ?

Yes, fixed.

>
> (15)
> + * PgStat_MsgSubscriptionErrPurge Sent by the autovacuum purge the subscription
> + * errors.
>
> Seems to be missing a word, is it meant to say "Sent by the autovacuum
> to purge the subscription errors." ?

Thanks, fixed.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-09-05 13:41:43 Re: Skipping logical replication transactions on subscriber side
Previous Message Thomas Munro 2021-09-05 13:32:55 Re: stat() vs ERROR_DELETE_PENDING, round N + 1