Re: Excessive number of replication slots for 12->14 logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Hubert Lubaczewski <depesz(at)depesz(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Excessive number of replication slots for 12->14 logical replication
Date: 2022-08-26 01:34:12
Message-ID: CAA4eK1JK0zs0H27-i3Pko2na-xt2vrO2r6Y3mn+sfr9CWzobBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Aug 25, 2022 at 7:38 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Wed, Aug 24, 2022 at 3:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Aug 23, 2022 at 7:56 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Aug 18, 2022 at 11:28 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > > >
> > > > On Thu, Aug 18, 2022 at 3:46 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > Looks good.
> > > > >
> > > > > I have one minor comment:
> > > > >
> > > > > - * SUBREL_STATE_FINISHEDCOPY. The apply worker can also
> > > > > - * concurrently try to drop the origin and by this time
> > > > > - * the origin might be already removed. For these reasons,
> > > > > - * passing missing_ok = true.
> > > > > + * SUBREL_STATE_FINISHEDCOPY. So passing missing_ok = true.
> > > > >
> > > > > I think we should change "the apply worker" to "the tablesync worker"
> > > > > but should not remove this sentence. The fact that another process
> > > > > could concurrently try to drop the origin is still true.
> > > > >
> > > > > The rest looks good to me.
> > > > >
> > > >
> > > > Updated as described.
> > > >
> > >
> > > The patch looks good to me though I would like to test it a bit more
> > > before pushing.
> > >
> >
> > While testing/reviewing it further, I noticed that the patch has used
> > missing_ok as true when dropping origin via tablesync worker. I don't
> > think that is correct because the concurrent operations that remove
> > origin like a refresh for the subscription take an access exclusive
> > lock on pg_subscription which prevent the previous operation to update
> > the rel state to SUBREL_STATE_SYNCDONE to succeed. So, I think we
> > should pass missing_ok as false which would be consistent with slot
> > handling. I have changed that and comments a few places. What do you
> > think of the attached?
> >
>
> Yes, this makes sense. I tested with a debugger to create simultaneous
> alter subscription refresh publication AND tablesync worker
> deleting the tracking origin. What I see is that the alter subscription command
> gets the lock on SubscriptionRelRelationId and then actually goes on to
> kill the tablesync worker. Even if the tablesync worker is waiting for
> the same lock,
> it looks like the SIGTERM is being handled and the worker is killed.
>
> From my testing, there doesn't seem to be a case where the tablesync worker
> would try and drop a previously dropped origin. So, I think we can pass
> missing_ok as 'false' while dropping origin. The patch looks good to me.
>

Thanks for the testing. I'll push this sometime early next week (by
Tuesday) unless Sawada-San or someone else has any comments on it.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2022-08-26 04:07:14 BUG #17596: "invalid attribute number 11" when updating partitioned table with a MULTIEXPR_SUBLINK
Previous Message Scott Zelenka 2022-08-26 00:22:15 Re: BUG #17593: min key size 112 bits from FIPS SP800-131 requirement for HMAC-SHA raises exception in SCRAM-SHA-256