Re: Skipping logical replication transactions on subscriber side

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-10-28 10:40:24
Message-ID: CAA4eK1J+PDjw_E6tT_AaCn+WSfMhJ1vC+ZuAwBYCSQzc7x9Yxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 28, 2021 at 10:36 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Oct 27, 2021 at 7:02 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Thu, Oct 21, 2021 at 10:29 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > >
> > > I've attached updated patches.
>
> Thank you for the comments!
>
> >
> > Few comments:
> > ==============
> > 1. Is the patch cleaning tablesync error entries except via vacuum? If
> > not, can't we send a message to remove tablesync errors once tablesync
> > is successful (say when we reset skip_xid or when tablesync is
> > finished) or when we drop subscription? I think the same applies to
> > apply worker. I think we may want to track it in some way whether an
> > error has occurred before sending the message but relying completely
> > on a vacuum might be the recipe of bloat. I think in the case of a
> > drop subscription we can simply send the message as that is not a
> > frequent operation. I might be missing something here because in the
> > tests after drop subscription you are expecting the entries from the
> > view to get cleared
>
> Yes, I think we can have tablesync worker send a message to drop stats
> once tablesync is successful. But if we do that also when dropping a
> subscription, I think we need to do that only the transaction is
> committed since we can drop a subscription that doesn't have a
> replication slot and rollback the transaction. Probably we can send
> the message only when the subscritpion does have a replication slot.
>

Right. And probably for apply worker after updating skip xid.

> In other cases, we can remember the subscriptions being dropped and
> send the message to drop the statistics of them after committing the
> transaction but I’m not sure it’s worth having it.
>

Yeah, let's not go to that extent. I think in most cases subscriptions
will have corresponding slots.

FWIW, we completely
> rely on pg_stat_vacuum_stats() for cleaning up the dead tables and
> functions. And we don't expect there are many subscriptions on the
> database.
>

True, but we do send it for the database, so let's do it for the cases
you explained in the first paragraph.

> >
> > 5.
> > +# Check if the view doesn't show any entries after dropping the subscriptions.
> > +$node_subscriber->safe_psql(
> > + 'postgres',
> > + q[
> > +DROP SUBSCRIPTION tap_sub;
> > +DROP SUBSCRIPTION tap_sub_streaming;
> > +]);
> > +$result = $node_subscriber->safe_psql('postgres',
> > + "SELECT count(1) FROM pg_stat_subscription_workers");
> > +is($result, q(0), 'no error after dropping subscription');
> >
> > Don't we need to wait after dropping the subscription and before
> > checking the view as there might be a slight delay in messages to get
> > cleared?
>
> I think the test always passes without waiting for the statistics to
> be updated since we fetch the subscription worker statistics from the
> stats collector based on the entries of pg_subscription catalog. So
> this test checks if statistics of already-dropped subscription doesn’t
> show up in the view after DROP SUBSCRIPTION, but does not check if the
> subscription worker statistics entry in the stats collector gets
> removed. The primary reason is that as I mentioned above, the patch
> relies on pgstat_vacuum_stat() for cleaning up the dead subscriptions.
>

That makes sense.

> >
> > 7.
> > +# Create subscriptions. The table sync for test_tab2 on tap_sub will enter to
> > +# infinite error due to violating the unique constraint.
> > +my $appname = 'tap_sub';
> > +$node_subscriber->safe_psql(
> > + 'postgres',
> > + "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr
> > application_name=$appname' PUBLICATION tap_pub WITH (streaming = off,
> > two_phase = on);");
> > +my $appname_streaming = 'tap_sub_streaming';
> > +$node_subscriber->safe_psql(
> > + 'postgres',
> > + "CREATE SUBSCRIPTION tap_sub_streaming CONNECTION
> > '$publisher_connstr application_name=$appname_streaming' PUBLICATION
> > tap_pub_streaming WITH (streaming = on, two_phase = on);");
> > +
> > +$node_publisher->wait_for_catchup($appname);
> > +$node_publisher->wait_for_catchup($appname_streaming);
> >
> > How can we ensure that subscriber would have caught up when one of the
> > tablesync workers is constantly in the error loop? Isn't it possible
> > that the subscriber didn't send the latest lsn feedback till the table
> > sync worker is finished?
> >
>
> I thought that even if tablesync for a table is still ongoing, the
> apply worker can apply commit records, update write LSN and flush LSN,
> and send the feedback to the wal sender. No?
>

You are right, this case will work.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2021-10-28 10:46:51 Re: Skipping logical replication transactions on subscriber side
Previous Message Andrew Bille 2021-10-28 10:30:21 Re: [Proposal] Global temporary tables