BUG #17893: Assert failed in heap_update()/_delete() when FK modiified by RI trigger in non-read-committed xact

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: exclusion(at)gmail(dot)com
Subject: BUG #17893: Assert failed in heap_update()/_delete() when FK modiified by RI trigger in non-read-committed xact
Date: 2023-04-12 12:00:01
Message-ID: 17893-35847009eec517b5@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17893
Logged by: Alexander Lakhin
Email address: exclusion(at)gmail(dot)com
PostgreSQL version: 15.2
Operating system: Ubuntu 22.04
Description:

The following script:
echo "
CREATE TABLE pkt(i int, PRIMARY KEY(i));
CREATE TABLE fkt(i int REFERENCES pkt ON DELETE SET NULL);
" | psql

for ((i=1;i<=50;i++)); do
echo "INSERT INTO pkt VALUES (1); INSERT INTO fkt VALUES (1);" | psql &
echo "BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; DELETE FROM pkt
WHERE i = 1; COMMIT;" | psql &
wait
grep 'TRAP:' server.log && break
done

triggers an assertion failure:
...
iteration 5
BEGIN
INSERT 0 1
INSERT 0 1
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost
TRAP: FailedAssertion("!ItemPointerEquals(&oldtup.t_self,
&oldtup.t_data->t_ctid)", File: "heapam.c", Line: 3492, PID: 2563451)

with the following stack trace:
Core was generated by `postgres: law regression [local] DELETE
'.
Program terminated with signal SIGABRT, Aborted.

warning: Section `.reg-xstate/2521461' in core file too small.
#0 __pthread_kill_implementation (no_tid=0, signo=6,
threadid=139782217279296) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6,
threadid=139782217279296) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=139782217279296) at
./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=139782217279296, signo=signo(at)entry=6) at
./nptl/pthread_kill.c:89
#3 0x00007f219521a476 in __GI_raise (sig=sig(at)entry=6) at
../sysdeps/posix/raise.c:26
#4 0x00007f21952007f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x000056068b9e4e2c in ExceptionalCondition (
conditionName=0x56068ba7f088 "!ItemPointerEquals(&oldtup.t_self,
&oldtup.t_data->t_ctid)",
errorType=0x56068ba7e3c9 "FailedAssertion", fileName=0x56068ba7e4e8
"heapam.c", lineNumber=3492)
at assert.c:69
#6 0x000056068b313dd3 in heap_update (relation=0x7f2195671cb0,
otid=0x7ffe2a4bf2c2, newtup=0x56068d987bd0,
cid=1, crosscheck=0x56068d8a0748, wait=true, tmfd=0x7ffe2a4bf368,
lockmode=0x7ffe2a4bf284) at heapam.c:3492
#7 0x000056068b324f02 in heapam_tuple_update (relation=0x7f2195671cb0,
otid=0x7ffe2a4bf2c2,
slot=0x56068d98a130, cid=1, snapshot=0x56068d8a0bb8,
crosscheck=0x56068d8a0748, wait=true,
tmfd=0x7ffe2a4bf368, lockmode=0x7ffe2a4bf284,
update_indexes=0x7ffe2a4bf281) at heapam_handler.c:327
#8 0x000056068b5ee748 in table_tuple_update (rel=0x7f2195671cb0,
otid=0x7ffe2a4bf2c2, slot=0x56068d98a130,
cid=1, snapshot=0x56068d8a0bb8, crosscheck=0x56068d8a0748, wait=true,
tmfd=0x7ffe2a4bf368,
lockmode=0x7ffe2a4bf284, update_indexes=0x7ffe2a4bf281) at
../../../src/include/access/tableam.h:1511
#9 0x000056068b5f1cf8 in ExecUpdateAct (context=0x7ffe2a4bf340,
resultRelInfo=0x56068d9861c8,
tupleid=0x7ffe2a4bf2c2, oldtuple=0x0, slot=0x56068d98a130,
canSetTag=true, updateCxt=0x7ffe2a4bf280)
at nodeModifyTable.c:2077
#10 0x000056068b5f2267 in ExecUpdate (context=0x7ffe2a4bf340,
resultRelInfo=0x56068d9861c8,
tupleid=0x7ffe2a4bf2c2, oldtuple=0x0, slot=0x56068d98a130,
canSetTag=true) at nodeModifyTable.c:2297
#11 0x000056068b5f4e65 in ExecModifyTable (pstate=0x56068d985fb0) at
nodeModifyTable.c:3798
#12 0x000056068b5b1731 in ExecProcNodeFirst (node=0x56068d985fb0) at
execProcnode.c:464
#13 0x000056068b5a49ad in ExecProcNode (node=0x56068d985fb0) at
../../../src/include/executor/executor.h:259
#14 0x000056068b5a78b4 in ExecutePlan (estate=0x56068d985d40,
planstate=0x56068d985fb0, use_parallel_mode=false,
operation=CMD_UPDATE, sendTuples=false, numberTuples=0,
direction=ForwardScanDirection,
dest=0x56068bd34d20 <spi_printtupDR>, execute_once=true) at
execMain.c:1636
#15 0x000056068b5a50e4 in standard_ExecutorRun (queryDesc=0x56068d9839f8,
direction=ForwardScanDirection,
count=0, execute_once=true) at execMain.c:363
#16 0x000056068b5a4ecf in ExecutorRun (queryDesc=0x56068d9839f8,
direction=ForwardScanDirection, count=0,
execute_once=true) at execMain.c:307
#17 0x000056068b610e2d in _SPI_pquery (queryDesc=0x56068d9839f8,
fire_triggers=false, tcount=0) at spi.c:2908
#18 0x000056068b610830 in _SPI_execute_plan (plan=0x56068d97ad00,
options=0x7ffe2a4bf690,
snapshot=0x56068bd69440 <SecondarySnapshotData>,
crosscheck_snapshot=0x56068d8a0748, fire_triggers=false)
at spi.c:2677
#19 0x000056068b60d39b in SPI_execute_snapshot (plan=0x56068d97ad00,
Values=0x7ffe2a4bf760,
Nulls=0x7ffe2a4bf960 " \320\225\215\006V", snapshot=0x56068bd69440
<SecondarySnapshotData>,
crosscheck_snapshot=0x56068d8a0748, read_only=false,
fire_triggers=false, tcount=0) at spi.c:795
#20 0x000056068b94654a in ri_PerformCheck (riinfo=0x56068d96f890,
qkey=0x7ffe2a4bfa58, qplan=0x56068d97ad00,
fk_rel=0x7f2195671cb0, pk_rel=0x7f2195665788, oldslot=0x56068d957510,
newslot=0x0, detectNewRows=true,
expect_OK=9) at ri_triggers.c:2408
#21 0x000056068b943b14 in ri_set (trigdata=0x7ffe2a4bff40, is_set_null=true,
tgkind=3) at ri_triggers.c:1185
#22 0x000056068b943398 in RI_FKey_setnull_del (fcinfo=0x7ffe2a4bfd80) at
ri_triggers.c:980
#23 0x000056068b5681a3 in ExecCallTriggerFunc (trigdata=0x7ffe2a4bff40,
tgindx=0, finfo=0x56068d955fd0,
instr=0x0, per_tuple_context=0x56068d973b80) at trigger.c:2430
#24 0x000056068b56beaf in AfterTriggerExecute (estate=0x56068d955750,
event=0x56068d971cb0,
relInfo=0x56068d955bd8, src_relInfo=0x56068d955bd8,
dst_relInfo=0x56068d955bd8, trigdesc=0x56068d955df0,
finfo=0x56068d955fd0, instr=0x0, per_tuple_context=0x56068d973b80,
trig_tuple_slot1=0x0,
trig_tuple_slot2=0x0) at trigger.c:4541
#25 0x000056068b56c4c6 in afterTriggerInvokeEvents (events=0x56068d8a09a0,
firing_id=1, estate=0x56068d955750,
delete_ok=false) at trigger.c:4780
#26 0x000056068b56ce12 in AfterTriggerEndQuery (estate=0x56068d955750) at
trigger.c:5135
#27 0x000056068b5a52f2 in standard_ExecutorFinish (queryDesc=0x56068d955340)
at execMain.c:438
#28 0x000056068b5a51a8 in ExecutorFinish (queryDesc=0x56068d955340) at
execMain.c:406
#29 0x000056068b81d50a in ProcessQuery (plan=0x56068d942e70,
sourceText=0x56068d8797a0 "DELETE FROM pkt WHERE i = 1;", params=0x0,
queryEnv=0x0, dest=0x56068d943850,
qc=0x7ffe2a4c03d0) at pquery.c:193
#30 0x000056068b81efe3 in PortalRunMulti (portal=0x56068d8e8030,
isTopLevel=true, setHoldSnapshot=false,
dest=0x56068d943850, altdest=0x56068d943850, qc=0x7ffe2a4c03d0) at
pquery.c:1277
#31 0x000056068b81e4c6 in PortalRun (portal=0x56068d8e8030,
count=9223372036854775807, isTopLevel=true,
run_once=true, dest=0x56068d943850, altdest=0x56068d943850,
qc=0x7ffe2a4c03d0) at pquery.c:791
#32 0x000056068b8172a1 in exec_simple_query (query_string=0x56068d8797a0
"DELETE FROM pkt WHERE i = 1;")
at postgres.c:1250
#33 0x000056068b81c1c0 in PostgresMain (dbname=0x56068d8a5ea8 "regression",
username=0x56068d8a5e88 "law")
at postgres.c:4593
#34 0x000056068b7414e1 in BackendRun (port=0x56068d89e980) at
postmaster.c:4511
#35 0x000056068b740d68 in BackendStartup (port=0x56068d89e980) at
postmaster.c:4239
#36 0x000056068b73ccf5 in ServerLoop () at postmaster.c:1806
#37 0x000056068b73c452 in PostmasterMain (argc=3, argv=0x56068d873a00) at
postmaster.c:1478
#38 0x000056068b6301d3 in main (argc=3, argv=0x56068d873a00) at main.c:202

With the cascading delete:
CREATE TABLE fkt(i int REFERENCES pkt ON DELETE CASCADE);
I get an assertion failure inside heap_delete():
iteration 10
BEGIN
INSERT 0 1
INSERT 0 1
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost
TRAP: FailedAssertion("!(tp.t_data->t_infomask & HEAP_XMAX_INVALID)", File:
"heapam.c", Line: 2872, PID: 2564424)

The first assert appeared with 5db6df0c0, so this failure reproduced on
REL_12_STABLE .. master. On REL_11_STABLE "ON DELETE SET NULL" triggers an
assert similar to the one present in heap_delete().
The same failure is observed with the serializable isolation level.

Initial analysis of the first case shows that inside heap_update() for an
assert-triggering tuple HeapTupleSatisfiesUpdate() returns TM_Ok (because
the tuple's xmin is commited and xmax is invalid), but
HeapTupleSatisfiesVisibility() returns false (because
XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot) == true).

A debug logging added to heap_update() shows:
2023-04-12 12:53:40.923 MSK|law|regression|64367fa4.259930|LOG:
heap_update() HeapTupleSatisfiesUpdate(&oldtup, cid, buffer): 0, crosscheck:
0x56476f5dc698, HeapTupleSatisfiesVisibility(&oldtup, crosscheck, buffer):
1, HeapTupleHeaderXminCommitted(oldtup.t_data): 1,
oldtup.t_data->t_infomask: 00000900,
HeapTupleHeaderXminFrozen(oldtup.t_data): 0,
XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(oldtup.t_data), crosscheck)): 0,
HeapTupleHeaderGetRawXmin(oldtup.t_data): 731, crosscheck->xmin: 732,
crosscheck->xmax: 732
^^^ a valid situation

2023-04-12 12:53:40.931 MSK|law|regression|64367fa4.25993a|LOG:
heap_update() HeapTupleSatisfiesUpdate(&oldtup, cid, buffer): 0, crosscheck:
0x56476f5dc698, HeapTupleSatisfiesVisibility(&oldtup, crosscheck, buffer):
0, HeapTupleHeaderXminCommitted(oldtup.t_data): 1,
oldtup.t_data->t_infomask: 00000900,
HeapTupleHeaderXminFrozen(oldtup.t_data): 0,
XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(oldtup.t_data), crosscheck)): 1,
HeapTupleHeaderGetRawXmin(oldtup.t_data): 736, crosscheck->xmin: 736,
crosscheck->xmax: 736
^^^ leads to the assertion failure

So in the latter case the tuple's xmin is not committed according to the
crosscheck snapshot. Meanwhile, a comment in nodeModifyTable.c says:
* Note: if es_crosscheck_snapshot isn't InvalidSnapshot, we check that
* the row to be updated is visible to that snapshot, and throw a
* can't-serialize error if not. ...

And with a non-assert build I really get:
ERROR: could not serialize access due to concurrent update
in these cases.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2023-04-12 12:03:28 Re: BUG #17892: Bug primary key
Previous Message PG Bug reporting form 2023-04-12 10:30:10 BUG #17892: Bug primary key