From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: test_decoding assertion failure for the loss of top-sub transaction relationship |
Date: | 2022-09-03 04:43:49 |
Message-ID: | CAA4eK1LK1nxOTL32OP=ejhPoBsUP4Bvwb3Ly=fethyJ-KbaXyw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Sep 2, 2022 at 6:26 AM osumi(dot)takamichi(at)fujitsu(dot)com
<osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
>
>
> I've met an assertion failure of logical decoding with below scenario on HEAD.
>
> ---
> <preparation>
> create table tab1 (val integer);
> select 'init' from pg_create_logical_replication_slot('regression_slot', 'test_decoding');
>
> <session1>
> begin;
> savepoint sp1;
> insert into tab1 values (1);
>
> <session2>
> checkpoint; -- for RUNNING_XACT
> select data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
>
> <session1>
> truncate tab1; -- for NEW_CID
> commit;
> begin;
> insert into tab1 values (3);
>
BTW, if I just change the truncate statement to "Analyze tab1" in your
entire test then I am getting a different assertion failure:
postgres.exe!ExceptionalCondition(const char * conditionName, const
char * errorType, const char * fileName, int lineNumber) Line 70 C
postgres.exe!AssertTXNLsnOrder(ReorderBuffer * rb) Line 902 C
postgres.exe!ReorderBufferTXNByXid(ReorderBuffer * rb, unsigned int
xid, bool create, bool * is_new, unsigned __int64 lsn, bool
create_as_top) Line 681 C
postgres.exe!ReorderBufferAddNewTupleCids(ReorderBuffer * rb, unsigned
int xid, unsigned __int64 lsn, RelFileLocator locator, ItemPointerData
tid, unsigned int cmin, unsigned int cmax, unsigned int combocid) Line
3188 C
postgres.exe!SnapBuildProcessNewCid(SnapBuild * builder, unsigned int
xid, unsigned __int64 lsn, xl_heap_new_cid * xlrec) Line 823 C
postgres.exe!heap2_decode(LogicalDecodingContext * ctx,
XLogRecordBuffer * buf) Line 408 C
postgres.exe!LogicalDecodingProcessRecord(LogicalDecodingContext *
ctx, XLogReaderState * record) Line 119 C
postgres.exe!pg_logical_slot_get_changes_guts(FunctionCallInfoBaseData
* fcinfo, bool confirm, bool binary) Line 274 C
postgres.exe!pg_logical_slot_get_changes(FunctionCallInfoBaseData *
fcinfo) Line 339 C
This is matching with call stack we see intermittently in the BF
[1][2]. The difference with your scenario is that the Truncate
statement generates an additional WAL XLOG_STANDBY_LOCK prior to
XLOG_HEAP2_NEW_CID.
I think we can fix this in the below ways:
a. Assert(prev_first_lsn <= cur_txn->first_lsn); -- Explain in
comments that it is possible when subtransaction and transaction are
not previously logged as it happened in this scenario
b. track txn of prev_first_lsn (say as prev_txn) and check if
prev_txn's toptxn is the same as cur_txn or cur_txn's toptxn is the
same as the prev_txn then perform assert mentioned in (a) else, keep
the current Assert.
It seems (b) will be more robust.
Thoughts?
Note: I added Sawada-San as sometime back we had an offlist discussion
on this intermittent BF failure but we were not able to reach the
exact test which can show this failure.
[1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-08-20%2002%3A45%3A34
[2] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-02-25%2018%3A50%3A09
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2022-09-03 05:03:43 | Re: win_flex.exe (and likely win_bison.exe) isn't concurrency safe |
Previous Message | Nathan Bossart | 2022-09-03 04:30:58 | Re: postgres_fdw hint messages |