Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Lakshmi Narayanan Sreethar <lakshmi(at)timescale(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1
Date: 2023-01-18 04:32:27
Message-ID: CAA4eK1K+mpN-nz4j2WobyFjJAtLzV2pzjb_QZf0yATjVM6dOtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Jan 18, 2023 at 1:34 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> On 2023-01-17 06:23:45 +0530, Amit Kapila wrote:
>
> > There is an analysis of the test
> > failure in the email [2] which explains the race condition that leads
> > to test failure. Thinking again about the failure, I feel we can
> > instead change the failed test (t/004_sync.pl) to either ensure that
> > both the walsenders (corresponding to sync worker and apply worker)
> > exits after dropping the subscription and before checking the
> > remaining slots on publisher or wait for slots to become zero in the
> > test.
>
> How about waiting for the table to start to be synced (and thus the slot to be
> created) before issuing the drop subscription?
>

In this test [1], the initial sync fails due to a unique constraint
violation, so checking that the sync has started is a bit tricky. We
can probably check sync_error_count in pg_stat_subscription_stats to
ensure that sync has started to fail which will ideally ensure that
the sync has started. I am not sure this would be completely safe. The
other possible ways are (a) after creating a subscription, wait for
two slots to get created in the publisher, and then after dropping
subscription wait for slots to become zero on the publisher; (b) after
dropping the subscription, wait for slots to become zero.

I think one of (a) or (b) will work.

[1]
# Table tap_rep already has the same records on both publisher and subscriber
# at this time. Recreate the subscription which will do the initial copy of
# the table again and fails due to unique constraint violation.
$node_subscriber->safe_psql('postgres',
"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr'
PUBLICATION tap_pub"
);
...
...

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Roman Cervenak 2023-01-18 07:51:30 IN clause behaving badly with missing comma and line break
Previous Message Peter Geoghegan 2023-01-18 01:55:36 Re: index not used for bigint without explicit cast