From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: Assertion failure in SnapBuildInitialSnapshot() |
Date: | 2023-02-07 20:05:20 |
Message-ID: | 20230207200520.znim32a66b4ca7iw@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2023-02-07 11:49:03 -0800, Andres Freund wrote:
> On 2023-02-01 11:23:57 +0530, Amit Kapila wrote:
> > On Tue, Jan 31, 2023 at 6:08 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > Attached updated patches.
> > >
> >
> > Thanks, Andres, others, do you see a better way to fix this problem? I
> > have reproduced it manually and the steps are shared at [1] and
> > Sawada-San also reproduced it, see [2].
> >
> > [1] - https://www.postgresql.org/message-id/CAA4eK1KDFeh%3DZbvSWPx%3Dir2QOXBxJbH0K8YqifDtG3xJENLR%2Bw%40mail.gmail.com
> > [2] - https://www.postgresql.org/message-id/CAD21AoDKJBB6p4X-%2B057Vz44Xyc-zDFbWJ%2Bg9FL6qAF5PC2iFg%40mail.gmail.com
>
> Hm. It's worrysome to now hold ProcArrayLock exclusively while iterating over
> the slots. ReplicationSlotsComputeRequiredXmin() can be called at a
> non-neglegible frequency. Callers like CreateInitDecodingContext(), that pass
> already_locked=true worry me a lot less, because obviously that's not a very
> frequent operation.
Separately from this change:
I wonder if we ought to change the setup in CreateInitDecodingContext() to be a
bit less intricate. One idea:
Instead of having GetOldestSafeDecodingTransactionId() compute a value, that
we then enter into a slot, that then computes the global horizon via
ReplicationSlotsComputeRequiredXmin(), we could have a successor to
GetOldestSafeDecodingTransactionId() change procArray->replication_slot_xmin
(if needed).
As long as CreateInitDecodingContext() prevents a concurent
ReplicationSlotsComputeRequiredXmin(), by holding ReplicationSlotControlLock
exclusively, that should suffice to ensure that no "wrong" horizon was
determined / no needed rows have been removed. And we'd not need a lock nested
inside ProcArrayLock anymore.
Not sure if it's sufficiently better to be worth bothering with though :(
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2023-02-07 20:10:09 | Re: improving user.c error messages |
Previous Message | Andres Freund | 2023-02-07 19:49:03 | Re: Assertion failure in SnapBuildInitialSnapshot() |