From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: Assertion failure in SnapBuildInitialSnapshot() |
Date: | 2023-02-09 06:32:01 |
Message-ID: | CAD21AoDi1fGGpie3vpxaHNiRdbsac2pJBbZAiLBay+Q=WArbRg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Feb 8, 2023 at 1:13 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Feb 8, 2023 at 1:19 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > On 2023-02-01 11:23:57 +0530, Amit Kapila wrote:
> > > On Tue, Jan 31, 2023 at 6:08 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > Attached updated patches.
> > > >
> > >
> > > Thanks, Andres, others, do you see a better way to fix this problem? I
> > > have reproduced it manually and the steps are shared at [1] and
> > > Sawada-San also reproduced it, see [2].
> > >
> > > [1] - https://www.postgresql.org/message-id/CAA4eK1KDFeh%3DZbvSWPx%3Dir2QOXBxJbH0K8YqifDtG3xJENLR%2Bw%40mail.gmail.com
> > > [2] - https://www.postgresql.org/message-id/CAD21AoDKJBB6p4X-%2B057Vz44Xyc-zDFbWJ%2Bg9FL6qAF5PC2iFg%40mail.gmail.com
> >
> > Hm. It's worrysome to now hold ProcArrayLock exclusively while iterating over
> > the slots. ReplicationSlotsComputeRequiredXmin() can be called at a
> > non-neglegible frequency. Callers like CreateInitDecodingContext(), that pass
> > already_locked=true worry me a lot less, because obviously that's not a very
> > frequent operation.
> >
> > This is particularly not great because we need to acquire
> > ReplicationSlotControlLock while already holding ProcArrayLock.
> >
> >
> > But clearly there's a pretty large hole in the lock protection right now. I'm
> > a bit confused about why we (Robert and I, or just I) thought it's ok to do it
> > this way.
> >
> >
> > I wonder if we could instead invert the locks, and hold
> > ReplicationSlotControlLock until after ProcArraySetReplicationSlotXmin(), and
> > acquire ProcArrayLock just for ProcArraySetReplicationSlotXmin().
> >
>
> Along with inverting, doesn't this mean that we need to acquire
> ReplicationSlotControlLock in Exclusive mode instead of acquiring it
> in shared mode? My understanding of the above locking scheme is that
> in CreateInitDecodingContext(), we acquire ReplicationSlotControlLock
> in Exclusive mode before acquiring ProcArrayLock in Exclusive mode and
> release it after releasing ProcArrayLock. Then,
> ReplicationSlotsComputeRequiredXmin() acquires
> ReplicationSlotControlLock in Exclusive mode only when already_locked
> is false and releases it after a call to
> ProcArraySetReplicationSlotXmin(). ProcArraySetReplicationSlotXmin()
> won't change.
I've attached the patch of this idea for discussion. In
GetOldestSafeDecodingTransactionId() called by
CreateInitDecodingContext(), we hold ReplicationSlotControlLock,
ProcArrayLock, and XidGenLock at a time. So we would need to be
careful about the ordering.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
fix_concurrent_slot_xmin_update.patch | application/octet-stream | 3.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-02-09 06:36:48 | Re: typos |
Previous Message | Bharath Rupireddy | 2023-02-09 06:21:28 | Re: WAL Insertion Lock Improvements |