BUG #18055: logical decoding core on AllocateSnapshotBuilder()

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: ocean_li_996(at)163(dot)com
Subject: BUG #18055: logical decoding core on AllocateSnapshotBuilder()
Date: 2023-08-14 16:04:48
Message-ID: 18055-ab3beed9f4b7b7d6@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18055
Logged by: ocean li
Email address: ocean_li_996(at)163(dot)com
PostgreSQL version: 11.9
Operating system: centos7 5.10.84 x86_64
Description:

For testing logical decoding module, *pg_logical_slot_get_changes* function
is used. Sometimes i got an core whose stack was like that:
==>
#0 0x00007f744a7b9277 in raise () from /lib64/libc.so.6
#1 0x00007f744a7ba968 in abort () from /lib64/libc.so.6
#2 0x00000000010edd37 in ExceptionalCondition (conditionName=0x17e2b58
"!(NInitialRunningXacts == 0 && InitialRunningXacts == ((void *)0))",
errorType=0x17e2b45 "FailedAssertion", fileName=0x17e2b39 "snapbuild.c",
lineNumber=381) at assert.c:46
#3 0x0000000000e60b46 in AllocateSnapshotBuilder (reorder=0x551ea98,
xmin_horizon=0, start_lsn=1267160216, need_full_snapshot=false) at
snapbuild.c:381
#4 0x0000000000e50f70 in StartupDecodingContext (output_plugin_options=0x0,
start_lsn=1267160216, xmin_horizon=0, need_full_snapshot=false,
fast_forward=false, read_page=0xe53023 <logical_read_local_xlog_page>,
prepare_write=0xe52df6 <LogicalOutputPrepareWrite>, do_write=0xe52e24
<LogicalOutputWrite>, update_progress=0x0) at logical.c:191
#5 0x0000000000e518b8 in CreateDecodingContext (start_lsn=1267160216,
output_plugin_options=0x0, fast_forward=false, read_page=0xe53023
<logical_read_local_xlog_page>, prepare_write=0xe52df6
<LogicalOutputPrepareWrite>, do_write=0xe52e24 <LogicalOutputWrite>,
update_progress=0x0) at logical.c:486
#6 0x0000000000e53735 in pg_logical_slot_get_changes_guts
(fcinfo=0x7ffcd879e3d0, confirm=true, binary=false) at logicalfuncs.c:259
#7 0x0000000000e53b1c in pg_logical_slot_get_changes
(fcinfo=0x7ffcd879e3d0) at logicalfuncs.c:393
#8 0x00000000010ff89e in FunctionCallInvokeCheckSPL (fcinfo=0x7ffcd879e3d0)
at fmgr.c:2262
...
==>
And in level #3 of stack above, NInitialRunningXacts is 2 and
InitialRunningXacts is not NULL observed in one of cores.

Using of NInitialRunningXacts and InitialRunningXacts are clear. Currently,
the core, as far as i know, maybe caused by this way: an ERROR raised when
calling *pg_logical_slot_get_changes_guts* function. The code part of
PG_CATCH() doses not reset NInitialRunningXacts and InitialRunningXacts.
Then, calling pg_logical_slot_get_changes_guts again, the core may occur.
Unfortunately, I couldn't find a minimal reproduction case. However, I
observed an *ERROR: canceling statement due to statement timeout* logged
before each core occurred. (For some reason, I can't provide the information
of log)

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message ocean_li_996 2023-08-14 17:03:52 Re: BUG #18055: logical decoding core on AllocateSnapshotBuilder()
Previous Message Devrim Gündüz 2023-08-14 11:02:52 Re: gpg error (again)