Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)
Date: 2022-10-17 01:34:41
Message-ID: CAD21AoDq+-V+Z3BSkQYuZQkhSmpgaJAzJdOwAZNcyAUnAWq+Lw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 13, 2022 at 4:08 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Oct 12, 2022 at 11:18 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Summarizing this issue, the assertion check in AssertTXNLsnOrder()
> > fails as reported because the current logical decoding cannot properly
> > handle the case where the decoding restarts from NEW_CID. Since we
> > don't make the association between top-level transaction and its
> > subtransaction while decoding NEW_CID (ie, in
> > SnapBuildProcessNewCid()), two transactions are created in
> > ReorderBuffer as top-txn and have the same LSN. This failure happens
> > on all supported versions.
> >
> > To fix the problem, one idea is that we make the association between
> > top-txn and sub-txn during that by calling ReorderBufferAssignChild(),
> > as Tomas proposed. On the other hand, since we don't guarantee to make
> > the association between the top-level transaction and its
> > sub-transactions until we try to decode the actual contents of the
> > transaction, it makes sense to me that instead of trying to solve by
> > making association, we need to change the code which are assuming that
> > it is associated.
> >
> > I've attached the patch for this idea. With the patch, we skip the
> > assertion checks in AssertTXNLsnOrder() until we reach the LSN at
> > which we start decoding the contents of transaction, ie.
> > start_decoding_at in SnapBuild. The minor concern is other way that
> > the assertion check could miss some faulty cases where two unrelated
> > top-transactions could have same LSN. With this patch, it will pass
> > for such a case. Therefore, for transactions that we skipped checking,
> > we do the check when we reach the LSN.
> >
>
> >
> --- a/src/backend/replication/logical/decode.c
> +++ b/src/backend/replication/logical/decode.c
> @@ -113,6 +113,15 @@
> LogicalDecodingProcessRecord(LogicalDecodingContext *ctx,
> XLogReaderState *recor
> buf.origptr);
> }
>
> +#ifdef USE_ASSERT_CHECKING
> + /*
> + * Check the order of transaction LSNs when we reached the start decoding
> + * LSN. See the comments in AssertTXNLsnOrder() for details.
> + */
> + if (SnapBuildGetStartDecodingAt(ctx->snapshot_builder) == buf.origptr)
> + AssertTXNLsnOrder(ctx->reorder);
> +#endif
> +
> rmgr = GetRmgr(XLogRecGetRmid(record));
> >
>
> I am not able to think how/when this check will be useful. Because we
> skipped assert checking only for records that are prior to
> start_decoding_at point, I think for those records ordering should
> have been checked before the restart. start_decoding_at point will be
> either (a) confirmed_flush location, or (b) lsn sent by client, and
> any record prior to that must have been processed before restart.

Good point. I was considering the case where the client sets far ahead
LSN but it's not worth considering this case in this context. I've
updated the patch accoringly.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v2-0001-Fix-the-assertion-failure-while-processing-NEW_CI.patch application/octet-stream 6.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2022-10-17 01:45:31 RE: [Proposal] Add foreign-server health checks infrastructure
Previous Message Kyotaro Horiguchi 2022-10-17 01:16:25 Re: Improve description of XLOG_RUNNING_XACTS