From: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
---|---|
To: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
Cc: | wangchuanting <wangchuanting(at)huawei(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog |
Date: | 2017-06-11 02:25:36 |
Message-ID: | 20170611022536.goef4cdbmgicye5g@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
Michael Paquier wrote:
> > i look carefully on the patch, because of we removed TwoPhaseStateLock
> > lwlock acquire in `RemoveGXact' and let caller held lwlock, so i think:
> > 1. xact_redo also need held lwlock before call PrepareRedoRemove
> > 2. RecoverPreparedTransactions also need held lwlock before call
> > ProcessTwoPhaseBuffer
>
> Thanks for the input. I have reviewed all the code paths that have
> been modified, and strengthened the code with assertions using
> LWLockHeldByMeInMode() to make sure that the correct lock is always
> hold in those code paths. There is actually no point in holding the
> lock in restoreTwoPhaseData(), but as this makes the code less
> consistent with the rest I added one. Also, I have replaced the lock
> acquisition in PrepareRedoAdd() with acquisitions at higher levels,
> and added an assertion in this routine. This makes the 2PC state data
> addition and removal more consistent with each other.
Thanks for the patch -- I agree that the new coding looks better.
I also agree that restoreTwoPhaseData doesn't need to hold the lock,
since we don't expect anything running concurrently, but that it's okay
to hold it and makes the whole thing simpler.
We could try to apply an equivalent rationale to
RecoverPreparedTransactions: even though we have already been running in
HOT standby mode for a while, there's no possibility of concurrent
activity either: since we exited recovery, master cannot write any more
2PC xacts, and we haven't yet set the flag that lets open sessions write
WAL. However, it seems mildly dangerous to run the bottom sections of
the loop with the lock held, since it acquires other lwlocks and
generally does a lot of crap.
Also, let's add an lwlock-is-held assertion to MarkAsPreparingGuts.
BTW: I think that saving one redundant line of code is not worth the
ugliness that it costs in xact_redo, so let's undo that. The patch is a
bit bulky, but the resulting code is simpler.
In short, I propose the attached.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
2pc-redo-lwlock-fix-v4.patch | text/plain | 12.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2017-06-11 03:24:48 | Re: Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog |
Previous Message | Alvaro Herrera | 2017-06-11 00:22:24 | Re: Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2017-06-11 03:24:48 | Re: Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog |
Previous Message | Tom Lane | 2017-06-11 01:31:25 | Re: memory fields from getrusage() |