From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'ocean_li_996' <ocean_li_996(at)163(dot)com> |
Cc: | 'Alexander Lakhin' <exclusion(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, "feichanghong(at)qq(dot)com" <feichanghong(at)qq(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | RE: Re:RE: Re:RE: Re:BUG #18369: logical decoding core on AssertTXNLsnOrder() |
Date: | 2024-03-12 10:22:59 |
Message-ID: | TYCPR01MB12077369E4B9B34979378F435F52B2@TYCPR01MB12077.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Dear Haiyang,
Thanks for checking! This reply was still focused only on "Issue 2" in your notation.
>## Issue 2
>Inspired by your spec case, I've reorganized the spec case provided in [2]. The new test in attachment
>is able to reproduce the issue mentioned in [1] even before commit 6b77048e5.
Good findings. I also confirmed the workload could fail after reverting the 6b77048e5.
Also confirmed that the patch [1] could fix the workload as well.
permutation "s0_init" "s0_begin" "s0_savepoint" "s0_create_part1" "s0_savepoint_release"
"s2_init" "s1_checkpoint" "s1_get_changes" "s0_commit" "s2_get_changes"
## Analysis
The point was that the serialized snapshot by another replication slot can be reused.
When the first get_change is called, a consistent snapshot can be serialized because
of the XLOG_RUNNING_XACTS record (see later).
The get_changes for the second slot reuses so that it can read WAL records property.
(If the first slot does not exist, the status of the snapshot would be
SNAPBUILD_BUILDING_SNAPSHOT. So no records are read)
In the second get_changes, below records are read. First (LOCK, RUNNING_XACTS)
pair is generated from the slot creation, and second pair comes from the
CHECKPOINT. I.e., it reads all records from the slot generation.
```
...lsn: 0/01906DB8, prev 0/01906D58, desc: LOCK ...
...lsn: 0/01906DF0, prev 0/01906DB8, desc: RUNNING_XACTS ...
...lsn: 0/01906E30, prev 0/01906DF0, desc: LOCK ...
...lsn: 0/01906E68, prev 0/01906E30, desc: RUNNING_XACTS ...
...lsn: 0/01906EA8, prev 0/01906E68, desc: CHECKPOINT_ONLINE ...
...lsn: 0/01906F20, prev 0/01906EA8, desc: COMMIT ... subxacts: 728; ... inval msgs: ...
```
Also the final COMMIT record contains the info for a subtransaction and
XACT_XINFO_HAS_INVALS flag, so DecodeCommit()->SnapBuildXidSetCatalogChanges()
is called transactions.
After that, two ReorderBufferTXNs are created with the same LSN, it fails the
assertion in AssertTXNLsnOrder().
I will update the patch if above analysis is correct.
>The approach in [3] is also LGFM.
Thanks. I agreed that we should not ease condition for Assert() as much as possible.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/
From | Date | Subject | |
---|---|---|---|
Next Message | Maxim Boguk | 2024-03-12 10:40:31 | Re: BUG #18387: Erroneous permission checks and/or misleading error messages with refresh materialized view |
Previous Message | Laurenz Albe | 2024-03-12 07:34:40 | Re: BUG #18387: Erroneous permission checks and/or misleading error messages with refresh materialized view |