From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | michael(at)paquier(dot)xyz |
Cc: | lingce(dot)ldm(at)alibaba-inc(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Problem with synchronous replication |
Date: | 2019-10-30 03:34:28 |
Message-ID: | 20191030.123428.18823202335157111.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
At Wed, 30 Oct 2019 10:45:11 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in
> On Tue, Oct 29, 2019 at 07:50:01PM +0900, Kyotaro Horiguchi wrote:
> > At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce(dot)ldm(at)alibaba-inc(dot)com> wrote in
> >> I recently discovered two possible bugs about synchronous replication.
> >>
> >> 1. SyncRepCleanupAtProcExit may delete an element that has been deleted
> >> SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
> >> acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
> >> it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.
> >>
> >> IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
> >> whether the queue is detached or not.
> >
> > I think you're right here.
>
> Indeed. Looking at the surroundings we expect some code paths to hold
> SyncRepLock in exclusive or shared mode but we don't actually check
> that the lock is hold. So let's add some assertions while on it.
I looked around closer.
If we do that strictly, other functions like
SyncRepGetOldestSyncRecPtr need the same Assert()s. I think static
functions don't need Assert() and caution in their comments would be
enough.
On the other hand, the similar-looking code in SyncRepInitConfig and
SyncRepUpdateSyncStandbysDefined seems safe since AFAICS it doesn't
have (this kind of) racing condition on wirtes. It might need a
comment like that. Or, we could go to (apparently) safer-side by
applying the same amendment to it.
SyncRepReleaseWaiters reads MyWalSnd->sync_standby_priority without
holding SyncRepLock, which could lead to a message with wrong
priority. I'm not sure it matters, though.
> > This is not right. It is in transaction commit so it is in a
> > HOLD_INTERRUPTS section. ProcessInterrupt does not respond to
> > cancel/die interrupts thus the ereport should return.
>
> Yeah. There is an easy way to check after that: InterruptHoldoffCount
> needs to be strictly positive.
>
> My suggestions are attached. Any thoughts?
Seems reasonable for holdoffs. The same assertion would be needed in
more places but it's another issue.
By the way while I was looking this, I found a typo. Please find the
attached.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
syncrep_comment_typo.patch | text/x-patch | 644 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2019-10-30 03:50:16 | Re: Typos and inconsistencies in code |
Previous Message | Michael Paquier | 2019-10-30 03:11:28 | Re: [PATCH] Do not use StdRdOptions in Access Methods |