Re: Excessive number of replication slots for 12->14 logical replication

From: Ajin Cherian <itsajin(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Hubert Lubaczewski <depesz(at)depesz(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Excessive number of replication slots for 12->14 logical replication
Date: 2022-08-25 14:07:37
Message-ID: CAFPTHDbY_DOr4zWQzNR5ZaC1eEWz5JtFdH1mp2jKbT83CbaRuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Aug 24, 2022 at 3:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Aug 23, 2022 at 7:56 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Thu, Aug 18, 2022 at 11:28 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Aug 18, 2022 at 3:46 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > Looks good.
> > > >
> > > > I have one minor comment:
> > > >
> > > > - * SUBREL_STATE_FINISHEDCOPY. The apply worker can also
> > > > - * concurrently try to drop the origin and by this time
> > > > - * the origin might be already removed. For these reasons,
> > > > - * passing missing_ok = true.
> > > > + * SUBREL_STATE_FINISHEDCOPY. So passing missing_ok = true.
> > > >
> > > > I think we should change "the apply worker" to "the tablesync worker"
> > > > but should not remove this sentence. The fact that another process
> > > > could concurrently try to drop the origin is still true.
> > > >
> > > > The rest looks good to me.
> > > >
> > >
> > > Updated as described.
> > >
> >
> > The patch looks good to me though I would like to test it a bit more
> > before pushing.
> >
>
> While testing/reviewing it further, I noticed that the patch has used
> missing_ok as true when dropping origin via tablesync worker. I don't
> think that is correct because the concurrent operations that remove
> origin like a refresh for the subscription take an access exclusive
> lock on pg_subscription which prevent the previous operation to update
> the rel state to SUBREL_STATE_SYNCDONE to succeed. So, I think we
> should pass missing_ok as false which would be consistent with slot
> handling. I have changed that and comments a few places. What do you
> think of the attached?
>

Yes, this makes sense. I tested with a debugger to create simultaneous
alter subscription refresh publication AND tablesync worker
deleting the tracking origin. What I see is that the alter subscription command
gets the lock on SubscriptionRelRelationId and then actually goes on to
kill the tablesync worker. Even if the tablesync worker is waiting for
the same lock,
it looks like the SIGTERM is being handled and the worker is killed.

From my testing, there doesn't seem to be a case where the tablesync worker
would try and drop a previously dropped origin. So, I think we can pass
missing_ok as 'false' while dropping origin. The patch looks good to me.

regards,
Ajin Cherian
Fujitsu Australia

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2022-08-25 17:26:08 BUG #17593: min key size 112 bits from FIPS SP800-131 requirement for HMAC-SHA raises exception in SCRAM-SHA-256
Previous Message hubert depesz lubaczewski 2022-08-25 12:44:11 pg_restore deadlocks with itself