From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [BUG] "FailedAssertion" reported when streaming in logical replication |
Date: | 2021-04-26 12:25:34 |
Message-ID: | CAFiTN-uNPy7syp0GL1aS8k6B6xbOsxhhsj2NGaxbCuGdenVNkg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Apr 26, 2021 at 1:26 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Mon, 26 Apr 2021 at 12:45 PM, tanghy(dot)fnst(at)fujitsu(dot)com <tanghy(dot)fnst(at)fujitsu(dot)com> wrote:
>>
>> Hi
>>
>> I think I may found a bug when using streaming in logical replication. Could anyone please take a look at this?
>>
>> Here's what I did to produce the problem.
>> I set logical_decoding_work_mem and created multiple publications at publisher, created multiple subscriptions with "streaming = on" at subscriber.
>> However, an assertion failed at publisher when I COMMIT and ROLLBACK multiple transactions at the same time.
>>
>> The log reported a FailedAssertion:
>> TRAP: FailedAssertion("txn->size == 0", File: "reorderbuffer.c", Line: 3465, PID: 911730)
>>
>> The problem happens both in synchronous mode and asynchronous mode. When there are only one or two publications, It doesn't seem to happen. (In my case, there are 8 publications and the failure always happened).
>>
>> The scripts and the log are attached. It took me about 4 minutes to run the script on my machine.
>> Please contact me if you need more specific info for the problem.
>
>
>
> Thanks for reporting. I will look into it.
I am able to reproduce this and I think I have done the initial investigation.
The cause of the issue is that, this transaction has only one change
and that change is XLOG_HEAP2_NEW_CID, which is added through
SnapBuildProcessNewCid. Basically, when we add any changes through
SnapBuildProcessChange we set the base snapshot but when we add
SnapBuildProcessNewCid this we don't set the base snapshot, because
there is nothing to be done for this change. Now, this transaction is
identified as the biggest transaction with non -partial changes, and
now in ReorderBufferStreamTXN, it will return immediately because the
base_snapshot is NULL. I think the fix should be while selecting the
largest transaction in ReorderBufferLargestTopTXN, we should check the
base_snapshot should not be NULL.
I will think more about this and post the patch.
From the core dump, we can see that base_snapshot is 0x0 and
ntuplecids = 1, and txn_flags = 1 also proves that it has a new
command id change. And the size of the txn also shows that it has
only one change and that is REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID
because in that case, the change size will be just the
sizeof(ReorderBufferChange) which is 80.
(gdb) p *txn
$4 = {txn_flags = 1, xid = 1115, toplevel_xid = 0, gid = 0x0,
first_lsn = 1061159120, final_lsn = 0, end_lsn = 0, toptxn = 0x0,
restart_decoding_lsn = 958642624,
origin_id = 0, origin_lsn = 0, commit_time = 0, base_snapshot = 0x0,
base_snapshot_lsn = 0, base_snapshot_node = {prev = 0x0, next = 0x0},
snapshot_now = 0x0,
command_id = 4294967295, nentries = 1, nentries_mem = 1, changes =
{head = {prev = 0x3907c18, next = 0x3907c18}}, tuplecids = {head =
{prev = 0x39073d8,
next = 0x39073d8}}, ntuplecids = 1, tuplecid_hash = 0x0,
toast_hash = 0x0, subtxns = {head = {prev = 0x30f1cd8, next =
0x30f1cd8}}, nsubtxns = 0,
ninvalidations = 0, invalidations = 0x0, node = {prev = 0x30f1a98,
next = 0x30c64f8}, size = 80, total_size = 80, concurrent_abort =
false,
output_plugin_private = 0x0}
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2021-04-26 12:40:54 | Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY |
Previous Message | Bharath Rupireddy | 2021-04-26 12:19:21 | Re: Enhanced error message to include hint messages for redundant options error |