Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)
Date: 2022-10-18 00:58:27
Message-ID: CAD21AoAE3j=4SgcY3gLZWwpW+OYV=fumhjeEQmb66AvV27-rrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 17, 2022 at 4:40 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Oct 12, 2022 at 11:18 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Please note that to pass the new regression tests, the fix proposed in
> > a related thread[1] is required. Particularly, we need:
> >
> > @@ -1099,6 +1099,9 @@ SnapBuildCommitTxn(SnapBuild *builder,
> > XLogRecPtr lsn, TransactionId xid,
> > else if (sub_needs_timetravel)
> > {
> > /* track toplevel txn as well, subxact alone isn't meaningful */
> > + elog(DEBUG2, "forced transaction %u to do timetravel
> > due to one of its subtransaction",
> > + xid);
> > + needs_timetravel = true;
> > SnapBuildAddCommittedTxn(builder, xid);
> > }
> > else if (needs_timetravel)
> >
> > A side benefit of this approach is that we can fix another assertion
> > failure too that happens on REL14 and REL15 and reported here[2]. In
> > the commits 68dcce247f1a(REL14) and 272248a0c1(REL15), the reason why
> > we make the association between sub-txns to top-txn in
> > SnapBuildXidSetCatalogChanges() is just to avoid the assertion failure
> > in AssertTXNLsnOrder(). However, since the invalidation messages are
> > not transported from sub-txn to top-txn during the assignment, another
> > assertion check in ReorderBufferForget() fails when forgetting the
> > subtransaction. If we apply this idea of skipping the assertion
> > checks, we no longer need to make the such association in
> > SnapBuildXidSetCatalogChanges() and resolve this issue as well.
> >
>
> IIUC, here you are speaking of three different changes. Change-1: Add
> a check in AssertTXNLsnOrder() to skip assert checking till we reach
> start_decoding_at. Change-2: Set needs_timetravel to true in one of
> the else if branches in SnapBuildCommitTxn(). Change-3: Remove the
> call to ReorderBufferAssignChild() from SnapBuildXidSetCatalogChanges
> in PG-14/15 as that won't be required after Change-1.

Yes.

>
> AFAIU, Change-1 is required till v10; Change-2 and Change-3 are
> required in HEAD/v15/v14 to fix the problem.

IIUC Change-2 is required in v16 and HEAD but not mandatory in v15 and
v14. The reason why we need Change-2 is that there is a case where we
mark only subtransactions as containing catalog change while not doing
that for its top-level transaction. In v15 and v14, since we mark both
subtransactions and top-level transaction in
SnapBuildXidSetCatalogChanges() as containing catalog changes, we
don't get the assertion failure at "Assert(!needs_snapshot ||
needs_timetravel)".

Regarding Change-3, it's required in v15 and v14 but not in HEAD and
v16. Since we didn't add SnapBuildXidSetCatalogChanges() to v16 and
HEAD, Change-3 cannot be applied to the two branches.

> Now, the second and third
> changes are not required in branches prior to v14 because we don't
> record invalidations via XLOG_XACT_INVALIDATIONS record. However, if
> we want, we can even back-patch Change-2 and Change-3 to keep the code
> consistent or maybe just Change-3.

Right. I don't think it's a good idea to back-patch Change-2 in
branches prior to v14 as it's not a relevant issue. Regarding
back-patching Change-3 to branches prior 14, I think it may be okay
til v11, but I'd be hesitant for v10 as the final release comes in a
month.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2022-10-18 01:06:14 Re: POC: Better infrastructure for automated testing of concurrency issues
Previous Message Richard Guo 2022-10-18 00:47:35 Re: havingQual vs hasHavingQual buglets