From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | vignesh C <vignesh21(at)gmail(dot)com> |
Cc: | PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Reorderbuffer crash during recovery |
Date: | 2019-11-07 05:31:17 |
Message-ID: | CAFiTN-sT6QdUptzifxc8ZCjsnvuJKiLhG6K4s_2=asX76B6nQQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On Thu, Nov 7, 2019 at 9:55 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Wed, Nov 6, 2019 at 5:41 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Wed, Nov 6, 2019 at 5:20 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > Hi,
> > >
> > > I found couple of crashes in reorderbuffer while review/testing of
> > > logical_work_mem and logical streaming of large in-progress
> > > transactions. Stack trace of the same are given below:
> > > Issue 1:
> > > #0 0x00007f985c7d8337 in raise () from /lib64/libc.so.6
> > > #1 0x00007f985c7d9a28 in abort () from /lib64/libc.so.6
> > > #2 0x0000000000ec514d in ExceptionalCondition
> > > (conditionName=0x10eab34 "!dlist_is_empty(head)", errorType=0x10eab24
> > > "FailedAssertion",
> > > fileName=0x10eab00 "../../../../src/include/lib/ilist.h",
> > > lineNumber=458) at assert.c:54
> > > #3 0x0000000000b4fd13 in dlist_tail_element_off (head=0x338fe60,
> > > off=48) at ../../../../src/include/lib/ilist.h:458
> > > #4 0x0000000000b547b7 in ReorderBufferAbortOld (rb=0x32ae7a0,
> > > oldestRunningXid=895) at reorderbuffer.c:1910
> > > #5 0x0000000000b3cb5e in DecodeStandbyOp (ctx=0x33424b0,
> > > buf=0x7fff7e7b1e40) at decode.c:332
> > > #6 0x0000000000b3c363 in LogicalDecodingProcessRecord (ctx=0x33424b0,
> > > record=0x3342770) at decode.c:121
> > > #7 0x0000000000b704b2 in XLogSendLogical () at walsender.c:2845
> > > #8 0x0000000000b6e9f8 in WalSndLoop (send_data=0xb7038b
> > > <XLogSendLogical>) at walsender.c:2199
> > > #9 0x0000000000b6bbf5 in StartLogicalReplication (cmd=0x33167a8) at
> > > walsender.c:1128
> > > #10 0x0000000000b6ce83 in exec_replication_command
> > > (cmd_string=0x328a0a0 "START_REPLICATION SLOT \"sub1\" LOGICAL 0/0
> > > (proto_version '1', publication_names '\"pub1\"')")
> > > at walsender.c:1545
> > > #11 0x0000000000c39f85 in PostgresMain (argc=1, argv=0x32b51c0,
> > > dbname=0x32b50e0 "testdb", username=0x32b50c0 "user1") at
> > > postgres.c:4256
> > > #12 0x0000000000b10dc7 in BackendRun (port=0x32ad890) at postmaster.c:4498
> > > #13 0x0000000000b0ff3e in BackendStartup (port=0x32ad890) at postmaster.c:4189
> > > #14 0x0000000000b08505 in ServerLoop () at postmaster.c:1727
> > > #15 0x0000000000b0781a in PostmasterMain (argc=3, argv=0x3284cb0) at
> > > postmaster.c:1400
> > > #16 0x000000000097492d in main (argc=3, argv=0x3284cb0) at main.c:210
> > >
> > > Issue 2:
> > > #0 0x00007f1d7ddc4337 in raise () from /lib64/libc.so.6
> > > #1 0x00007f1d7ddc5a28 in abort () from /lib64/libc.so.6
> > > #2 0x0000000000ec4e1d in ExceptionalCondition
> > > (conditionName=0x10ead30 "txn->final_lsn != InvalidXLogRecPtr",
> > > errorType=0x10ea284 "FailedAssertion",
> > > fileName=0x10ea2d0 "reorderbuffer.c", lineNumber=3052) at assert.c:54
> > > #3 0x0000000000b577e0 in ReorderBufferRestoreCleanup (rb=0x2ae36b0,
> > > txn=0x2bafb08) at reorderbuffer.c:3052
> > > #4 0x0000000000b52b1c in ReorderBufferCleanupTXN (rb=0x2ae36b0,
> > > txn=0x2bafb08) at reorderbuffer.c:1318
> > > #5 0x0000000000b5279d in ReorderBufferCleanupTXN (rb=0x2ae36b0,
> > > txn=0x2b9d778) at reorderbuffer.c:1257
> > > #6 0x0000000000b5475c in ReorderBufferAbortOld (rb=0x2ae36b0,
> > > oldestRunningXid=3835) at reorderbuffer.c:1973
> > > #7 0x0000000000b3ca03 in DecodeStandbyOp (ctx=0x2b676d0,
> > > buf=0x7ffcbc74cc00) at decode.c:332
> > > #8 0x0000000000b3c208 in LogicalDecodingProcessRecord (ctx=0x2b676d0,
> > > record=0x2b67990) at decode.c:121
> > > #9 0x0000000000b70b2b in XLogSendLogical () at walsender.c:2845
> > >
> > > From initial analysis it looks like:
> > > Issue1 it seems like if all the reorderbuffer has been flushed and
> > > then the server restarts. This problem occurs.
> > > Issue 2 it seems like if there are many subtransactions present and
> > > then the server restarts. This problem occurs. The subtransaction's
> > > final_lsn is not being set and when ReorderBufferRestoreCleanup is
> > > called the assert fails. May be for this we might have to set the
> > > subtransaction's final_lsn before cleanup(not sure).
> > >
> > > I could not reproduce this issue consistently with a test case, But I
> > > felt this looks like a problem from review.
> > >
> > > For issue1, I could reproduce by the following steps:
> > > 1) Change ReorderBufferCheckSerializeTXN so that it gets flushed always.
> > > 2) Have many open transactions with subtransactions open.
> > > 3) Attach one of the transaction from gdb and call abort().
> >
> > Do you need subtransactions for the issue1? It appears that after the
> > restart if the changes list is empty it will hit the assert. Am I
> > missing something?
> >
>
> When I had reported this issue I could reproduce this issue with
> sub-transactions. Now I have tried without using sub-transactions and
> could still reproduce this issue. You are right Issue 1 will appear in
> both the cases with and without subtransactions.
Okay, thanks for the confirmation.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2019-11-07 07:37:00 | Re: The XLogFindNextRecord() routine find incorrect record start point after a long continuation record |
Previous Message | vignesh C | 2019-11-07 04:25:16 | Re: Reorderbuffer crash during recovery |
From | Date | Subject | |
---|---|---|---|
Next Message | Paul A Jungwirth | 2019-11-07 05:31:51 | Re: SQL:2011 PERIODS vs Postgres Ranges? |
Previous Message | Prabhat Sahu | 2019-11-07 05:27:20 | Re: tableam vs. TOAST |