Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Lakshmi Narayanan Sreethar <lakshmi(at)timescale(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1
Date: 2023-01-17 20:04:32
Message-ID: 20230117200432.xaoenn7ni7srb2l2@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2023-01-17 06:23:45 +0530, Amit Kapila wrote:
> As per my initial analysis, I have added this code to hold/resume
> interrupts during slot creation due to the test failure (in buildfarm)
> reported in the email [1]. It is clearly a wrong fix as per the report
> and discussion in this thread.

Yea. You really can never hold interrupts across some thing that could
indefinitely be blocked. A HOLD_INTERRUPTS() while doing error recovery (as in
DisableSubscriptionAndExit()) is fine, that's basically a finite amount of
work. But doing so while issuing SQL commands to another node, or anything
else that could just block indefinitely, isn't.

> There is an analysis of the test
> failure in the email [2] which explains the race condition that leads
> to test failure. Thinking again about the failure, I feel we can
> instead change the failed test (t/004_sync.pl) to either ensure that
> both the walsenders (corresponding to sync worker and apply worker)
> exits after dropping the subscription and before checking the
> remaining slots on publisher or wait for slots to become zero in the
> test.

How about waiting for the table to start to be synced (and thus the slot to be
created) before issuing the drop subscription? If the slot hadn't yet been
created, the test doesn't prove that we successfully clean up...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Masahiko Sawada 2023-01-18 01:40:39 Re: BUG #17741: vacuum process hangs after pg_surgery manipulations
Previous Message David G. Johnston 2023-01-17 15:07:31 Re: Possible wrong result with some "in" subquery with non-existing columns