Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1

From: vignesh C <vignesh21(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Lakshmi Narayanan Sreethar <lakshmi(at)timescale(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1
Date: 2023-01-20 05:24:58
Message-ID: CALDaNm3hL_1o0GwQnFEazgezrcZGJK7e1TiFA86o=rie5gGD6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, 19 Jan 2023 at 12:44, houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Wednesday, January 18, 2023 12:32 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 18, 2023 at 1:34 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > >
> > > On 2023-01-17 06:23:45 +0530, Amit Kapila wrote:
> > >
> > > > There is an analysis of the test
> > > > failure in the email [2] which explains the race condition that
> > > > leads to test failure. Thinking again about the failure, I feel we
> > > > can instead change the failed test (t/004_sync.pl) to either ensure
> > > > that both the walsenders (corresponding to sync worker and apply
> > > > worker) exits after dropping the subscription and before checking
> > > > the remaining slots on publisher or wait for slots to become zero in
> > > > the test.
> > >
> > > How about waiting for the table to start to be synced (and thus the
> > > slot to be
> > > created) before issuing the drop subscription?
> > >
> >
> > In this test [1], the initial sync fails due to a unique constraint violation, so
> > checking that the sync has started is a bit tricky. We can probably check
> > sync_error_count in pg_stat_subscription_stats to ensure that sync has started to
> > fail which will ideally ensure that the sync has started. I am not sure this would be
> > completely safe. The other possible ways are (a) after creating a subscription,
> > wait for two slots to get created in the publisher, and then after dropping
> > subscription wait for slots to become zero on the publisher; (b) after dropping
> > the subscription, wait for slots to become zero.
> >
> > I think one of (a) or (b) will work.
>
> I think in the mentioned testcase, the tablesync worker will keep restarting which
> means the table sync slot is also being dropped and re-created ... . So, (a) waiting for
> two slots to get created might not work as the slot will get dropped soon. I
> think (b) waiting for slot to become zero would be a simpler way to make the test
> stable. And here are the patches that tries to do it for all affected branches.

Thanks for the patch, the deadlock issue gets resolved with the shared
patch. I had tested in HEAD, PG15 and PG14 and found no issues and
make check-world works fine.

Regards,
Vignesh

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2023-01-20 10:26:48 BUG #17755: database queries get stuck for certain IDs
Previous Message Tom Lane 2023-01-20 00:09:50 Re: BUG #17753: pg_dump --if-exists bug