BUG #17798: Incorrect memory access occurs when using BEFORE ROW UPDATE trigger

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: exclusion(at)gmail(dot)com
Subject: BUG #17798: Incorrect memory access occurs when using BEFORE ROW UPDATE trigger
Date: 2023-02-17 10:00:00
Message-ID: 17798-0907404928dcf0dd@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17798
Logged by: Alexander Lakhin
Email address: exclusion(at)gmail(dot)com
PostgreSQL version: 15.2
Operating system: Ubuntu 22.04
Description:

When executing the following queries under valgrind:
cat << "EOF" | psql
CREATE TABLE bruttest(cnt int DEFAULT 0, t text);
CREATE FUNCTION noop_tfunc() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN
RETURN NEW; END$$;
CREATE TRIGGER brupdate_trigger BEFORE UPDATE ON bruttest FOR EACH ROW
EXECUTE FUNCTION noop_tfunc();
INSERT INTO bruttest(t) VALUES (repeat('x', 1000));
EOF

for ((i=1;i<=4;i++)); do
echo "iteration $i"
psql -c "UPDATE bruttest SET cnt = cnt + 1" &
psql -c "UPDATE bruttest SET cnt = cnt + 1" &
wait
done

I get a failure:
...
iteration 4
UPDATE 1
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost

server.log contains:
==00:00:00:08.877 3381345== Invalid read of size 1
==00:00:00:08.877 3381345== at 0x1E4721: heap_compute_data_size
(heaptuple.c:138)
==00:00:00:08.877 3381345== by 0x1E5674: heap_form_tuple
(heaptuple.c:1061)
==00:00:00:08.877 3381345== by 0x413AF6: tts_buffer_heap_materialize
(execTuples.c:748)
==00:00:00:08.877 3381345== by 0x414CB9: ExecFetchSlotHeapTuple
(execTuples.c:1654)
==00:00:00:08.877 3381345== by 0x3DF10B: ExecBRUpdateTriggers
(trigger.c:3074)
==00:00:00:08.877 3381345== by 0x438E2C: ExecUpdatePrologue
(nodeModifyTable.c:1912)
==00:00:00:08.877 3381345== by 0x43A623: ExecUpdate
(nodeModifyTable.c:2269)
==00:00:00:08.877 3381345== by 0x43D8BD: ExecModifyTable
(nodeModifyTable.c:3856)
==00:00:00:08.877 3381345== by 0x40E7F1: ExecProcNodeFirst
(execProcnode.c:464)
==00:00:00:08.877 3381345== by 0x406E19: ExecProcNode (executor.h:259)
==00:00:00:08.877 3381345== by 0x406E19: ExecutePlan (execMain.c:1636)
==00:00:00:08.877 3381345== by 0x406FF9: standard_ExecutorRun
(execMain.c:363)
==00:00:00:08.877 3381345== by 0x4070C5: ExecutorRun (execMain.c:307)
==00:00:00:08.877 3381345== Address 0x957d114 is in a rw- anonymous
segment
==00:00:00:08.877 3381345==
{
<insert_a_suppression_name_here>
...
==00:00:00:08.877 3381345==
==00:00:00:08.877 3381345== Exit program on first error
(--exit-on-first-error=yes)
2023-02-17 11:27:17.663 MSK [3381268] LOG: server process (PID 3381345)
exited with exit code 1
2023-02-17 11:27:17.663 MSK [3381268] DETAIL: Failed process was running:
UPDATE bruttest SET cnt = cnt + 1

I've also seen a relevant failure detected by asan (on master):
==1912964==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x7f223dd52c78 at pc 0x55c9e846146a bp 0x7fff0c63e560 sp 0x7fff0c63dd30
WRITE of size 42205122 at 0x7f223dd52c78 thread T0
...
#8 0x000055c9e846953f in __asan::ReportGenericError(unsigned long, unsigned
long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool)
()
#9 0x000055c9e846148c in __asan_memcpy ()
#10 0x000055c9e84d09c5 in fill_val (att=<optimized out>, bit=<optimized
out>, bitmask=bitmask(at)entry=0x7fff0c63e620,
dataP=dataP(at)entry=0x7fff0c63e5e0,
infomask=infomask(at)entry=0x7f223dd09864, datum=<optimized out>,
isnull=<optimized out>) at heaptuple.c:270
#11 0x000055c9e84d0080 in heap_fill_tuple (tupleDesc=<optimized out>,
values=<optimized out>, isnull=<optimized out>,
data=<optimized out>, data_size=<optimized out>, infomask=<optimized
out>, bit=<optimized out>) at heaptuple.c:336
#12 0x000055c9e84d530e in heap_form_tuple (tupleDescriptor=0x7f22898f76e8,
values=0x1d3084, isnull=0x6250000c85d8)
at heaptuple.c:1090
#13 0x000055c9e8bc8d0e in tts_buffer_heap_materialize (slot=0x6250000c8548)
at execTuples.c:748
#14 0x000055c9e8bcd02a in ExecFetchSlotHeapTuple
(slot=slot(at)entry=0x6250000c8548, materialize=<optimized out>,
shouldFree=shouldFree(at)entry=0x7fff0c63e810) at execTuples.c:1654
#15 0x000055c9e8afbd9a in ExecBRUpdateTriggers (estate=<optimized out>,
epqstate=<optimized out>,
relinfo=0x6250000a2e80, tupleid=<optimized out>,
fdw_trigtuple=<optimized out>, newslot=<optimized out>,
tmfd=<optimized out>) at trigger.c:3022
...

According to git bisect, first bad commit for this anomaly is 86dc90056.

That commit changed just one thing in trigger.c:
- epqslot_clean =
ExecFilterJunk(relinfo->ri_junkFilter, epqslot_candidate);
+ epqslot_clean = ExecGetUpdateNewTuple(relinfo,
epqslot_candidate,
+
oldslot);

I've found the following explanation for the failure:
1) After the ExecGetUpdateNewTuple() call the newslot and the oldslot are
linked together (their slot->tts_values[1] in this case point to the same
memory address (inside the oldslot' buffer)).
2) Previously, GetTupleForTrigger() could get a tuple with a new buffer,
so the oldslot would be the only user of the buffer at that moment.
(The newslot doesn't become an official user of the buffer.)
3) Then, trigtuple = ExecFetchSlotHeapTuple(oldslot, ...) invokes
tts_buffer_heap_materialize() where the oldslot->buffer is released.
4) Finally, newtuple = ExecFetchSlotHeapTuple(newslot, ...) invokes
tts_buffer_heap_materialize() where an incorrect access to memory
that became anonymous occurs, and that is detected by valgrind.
If not detected, different consequences are possible (in the asan case
it was memcpy with an incorrectly read extra large data_len).

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Samo Dadela 2023-02-17 11:21:25 pg_restore: warning: could not find where to insert IF EXISTS in statement "-- *not* dropping schema, since initdb creates it
Previous Message David G. Johnston 2023-02-16 19:31:08 Re: BUG #17797: connection error