From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION |
Date: | 2023-11-22 11:21:25 |
Message-ID: | 69e9a1af-a315-8b05-0674-2cc53bf27682@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11/22/23 11:38, Amit Kapila wrote:
> On Tue, Nov 21, 2023 at 6:56 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> On 11/21/23 14:16, Amit Kapila wrote:
>>> On Tue, Nov 21, 2023 at 5:17 PM Tomas Vondra
>>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>>
>>>
>>> It seems there is some inconsistency in what you have written for
>>> client backends/tablesync worker vs. apply worker. The above text
>>> seems to be saying that the client backend and table sync worker are
>>> waiting on a "subscription row in pg_subscription" and the apply
>>> worker is operating on "pg_subscription_rel". So, if that is true then
>>> they shouldn't get stuck.
>>>
>>> I think here client backend and tablesync worker seems to be blocked
>>> for a lock on pg_subscription_rel.
>>>
>>
>> Not really, they are all locking the subscription. All the locks are on
>> classid=6100, which is pg_subscription:
>>
>> test=# select 6100::regclass;
>> regclass
>> -----------------
>> pg_subscription
>> (1 row)
>>
>> The thing is, the tablesync workers call UpdateSubscriptionRelState,
>> which locks the pg_subscription catalog at the very beginning:
>>
>> LockSharedObject(SubscriptionRelationId, ...);
>>
>> So that's the issue. I haven't explored why it's done this way, and
>> there's no comment explaining locking the subscriptions is needed ...
>>
>
> I think it prevents concurrent drop of rel during the REFRESH operation.
>
Yes. Or maybe some other concurrent DDL on the relations included in the
subscription.
>>>> The tablesync workers can't proceed because their lock request is stuck
>>>> behind the AccessExclusiveLock request.
>>>>
>>>> And the apply worker can't proceed, because it's waiting for status
>>>> update from the tablesync workers.
>>>>
>>>
>>> This part is not clear to me because
>>> wait_for_relation_state_change()->GetSubscriptionRelState() seems to
>>> be releasing the lock while closing the relation. Am, I missing
>>> something?
>>>
>>
>> I think you're missing the fact that GetSubscriptionRelState() acquires
>> and releases the lock on pg_subscription_rel, but that's not the lock
>> causing the issue. The problem is the lock on the pg_subscription row.
>>
>
> Okay. IIUC, what's going on here is that the apply worker acquires
> AccessShareLock on pg_subscription to update rel state for one of the
> tables say tbl-1, and then for another table say tbl-2, it started
> waiting for a state change via wait_for_relation_state_change(). I
> think here the fix is to commit the transaction before we go for a
> wait. I guess we need something along the lines of what is proposed in
> [1] though we have solved the problem in that thread in some other
> way..
>
Possibly. I haven't checked if the commit might have some unexpected
consequences, but I can confirm I can no longer reproduce the deadlock
with the patch applied.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2023-11-22 11:30:33 | Re: Changing baserel to foreignrel in postgres_fdw functions |
Previous Message | Amit Kapila | 2023-11-22 10:38:49 | Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION |