| From: | ocean_li_996 <ocean_li_996(at)163(dot)com> | 
|---|---|
| To: | pgsql-bugs(at)lists(dot)postgresql(dot)org | 
| Subject: | Re:BUG #18369: logical decoding core on AssertTXNLsnOrder() | 
| Date: | 2024-02-28 07:57:37 | 
| Message-ID: | 6d0e80d6.c1fc.18deeb8120a.Coremail.ocean_li_996@163.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs | 
At 2024-02-28 15:53:30, "PG Bug reporting form" <noreply(at)postgresql(dot)org> wrote:
>1) The WAL records from restart_lsn to the corresponding lsn when the issue
>occurred,
>2) personal analysis of the problem,
>3) the steps to reproduce the issue,
>4) personal proposed solution
>will be posted later under this thread.
>
1) The WAL records from restart_lsn to the corresponding lsn when the issue occurred is supported in attachment file 1.
2) As indicated in 1), some invalidation messages are generated in 19933 top xact. After the decoding restarted, the invalidation messages will make 19933 top xact and its subtransaction(s) to be marked as containing catalog change while processing its commit record(see SnapBuildXidSetCatalogChanges() ). In this step, the corresponding subxacts which never procedded before are added into ReorderBuffer with the same first_lsn as top-level xact. Then, the check in AssertTXNLsnOrder() will failed if the number of subxact mentioned above more than 1.
3) The patch to reproduce the issue is supported in attachment file 2. DML on temporary table can consume xid and not log any WAL RECORD except it's the firtst subxact of top xact(log ASSIGNMENT record). So we use DML on temporary table to generate two "never procedded before" sunxacts in on top xact.
4) Since it is already known to be a subxact before being add into ReorderBuffer, I think an appropriate fix is extending the ReorderBufferXidSetCatalogChanges function with an is_top parameter to indicate whether the xact is a top-level xact. 
For a subxact, it would not be added to the toplevel_by_lsn list and would not undergo the AssertTXNLsnOrder check. Of course, it is necessary to introduce a check to verify whether a node is in the list when attempting to remove a node from toplevel_by_lsn.  
The specific fix patch is provided in Attachment 3.
Thanks
Haiyang Li
| Attachment | Content-Type | Size | 
|---|---|---|
| xid_19933_wal_record.txt | text/plain | 9.1 KB | 
| v1-0001-Testcase-Coredump-On-AssertTXNLsnOrder.patch | application/octet-stream | 2.5 KB | 
| v1-0002-Fix-Coredump-On-AssertTXNLsnOrder.patch | application/octet-stream | 4.3 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | ocean_li_996 | 2024-02-28 08:20:17 | Re:Re:BUG #18369: logical decoding core on AssertTXNLsnOrder() | 
| Previous Message | PG Bug reporting form | 2024-02-28 07:53:30 | BUG #18369: logical decoding core on AssertTXNLsnOrder() |