From: | lingce(dot)ldm <lingce(dot)ldm(at)alibaba-inc(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Problem with synchronous replication |
Date: | 2019-10-30 06:27:46 |
Message-ID: | 1A3B323A-782E-4204-8396-0BCDA4695827@alibaba-inc.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Oct 30, 2019, at 09:45, Michael Paquier <michael(at)paquier(dot)xyz <mailto:michael(at)paquier(dot)xyz>> wrote:
>
> On Tue, Oct 29, 2019 at 07:50:01PM +0900, Kyotaro Horiguchi wrote:
>> At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce(dot)ldm(at)alibaba-inc(dot)com> wrote in
>>> I recently discovered two possible bugs about synchronous replication.
>>>
>>> 1. SyncRepCleanupAtProcExit may delete an element that has been deleted
>>> SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
>>> acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
>>> it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.
>>>
>>> IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
>>> whether the queue is detached or not.
>>
>> I think you're right here.
>
> Indeed. Looking at the surroundings we expect some code paths to hold
> SyncRepLock in exclusive or shared mode but we don't actually check
> that the lock is hold. So let's add some assertions while on it.
>
>> This is not right. It is in transaction commit so it is in a
>> HOLD_INTERRUPTS section. ProcessInterrupt does not respond to
>> cancel/die interrupts thus the ereport should return.
>
> Yeah. There is an easy way to check after that: InterruptHoldoffCount
> needs to be strictly positive.
>
> My suggestions are attached. Any thoughts?
Thanks, this patch looks good to me.
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2019-10-30 06:32:09 | Re: RFC: split OBJS lines to one object per line |
Previous Message | lingce.ldm | 2019-10-30 06:27:33 | Re: Problem with synchronous replication |