BUG #17843: Writing uninitialised data in logtape/buffile

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: exclusion(at)gmail(dot)com
Subject: BUG #17843: Writing uninitialised data in logtape/buffile
Date: 2023-03-14 18:00:03
Message-ID: 17843-11a124c99750fbe3@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17843
Logged by: Alexander Lakhin
Email address: exclusion(at)gmail(dot)com
PostgreSQL version: 15.2
Operating system: Ubuntu 22.04
Description:

The following slightly modified excerpt from
src/test/regress/sql/groupingsets.sql:
set work_mem = '64kB';
set hash_mem_multiplier = 2.0;
create table gs_data_1 as
select g%1000 as g1000, g%100 as g100, g%10 as g10, g
from generate_series(0,1999) g;

select g100, g10, sum(g::numeric), count(*), max(g::text)
from gs_data_1 group by cube (g1000, g100,g10);

produces a valgrind-detected error:
==00:00:00:04.149 3088935== Syscall param pwrite64(buf) points to
uninitialised byte(s)
==00:00:00:04.149 3088935== at 0x54259AA: pwrite (pwrite64.c:25)
==00:00:00:04.149 3088935== by 0x58D32F: FileWrite (fd.c:2215)
==00:00:00:04.149 3088935== by 0x589BFC: BufFileDumpBuffer
(buffile.c:529)
==00:00:00:04.149 3088935== by 0x589DF4: BufFileFlush (buffile.c:672)
==00:00:00:04.149 3088935== by 0x58A672: BufFileSeek (buffile.c:751)
==00:00:00:04.149 3088935== by 0x58A74D: BufFileSeekBlock
(buffile.c:802)
==00:00:00:04.149 3088935== by 0x7490B4: ltsWriteBlock (logtape.c:263)
==00:00:00:04.149 3088935== by 0x7495B2: LogicalTapeWrite
(logtape.c:815)
==00:00:00:04.149 3088935== by 0x41B003: hashagg_spill_tuple
(nodeAgg.c:2957)
==00:00:00:04.149 3088935== by 0x41D003: lookup_hash_entries
(nodeAgg.c:2120)
==00:00:00:04.149 3088935== by 0x41E39F: agg_retrieve_direct
(nodeAgg.c:2432)
==00:00:00:04.149 3088935== by 0x41E55F: ExecAgg (nodeAgg.c:2161)
==00:00:00:04.149 3088935== Address 0x4e70c118 is 120 bytes inside a block
of size 8,264 client-defined
==00:00:00:04.149 3088935== at 0x743797: palloc (mcxt.c:1093)
==00:00:00:04.149 3088935== by 0x5899AA: makeBufFileCommon
(buffile.c:115)
==00:00:00:04.149 3088935== by 0x5899F0: makeBufFile (buffile.c:136)
==00:00:00:04.149 3088935== by 0x589FE5: BufFileCreateTemp
(buffile.c:207)
==00:00:00:04.149 3088935== by 0x74922E: LogicalTapeSetCreate
(logtape.c:602)
==00:00:00:04.149 3088935== by 0x41ACF1: hash_agg_enter_spill_mode
(nodeAgg.c:1876)
==00:00:00:04.149 3088935== by 0x41AE16: hash_agg_check_limits
(nodeAgg.c:1854)
==00:00:00:04.149 3088935== by 0x41AE3F: initialize_hash_entry
(nodeAgg.c:2034)
==00:00:00:04.149 3088935== by 0x41CFC1: lookup_hash_entries
(nodeAgg.c:2107)
==00:00:00:04.149 3088935== by 0x41E39F: agg_retrieve_direct
(nodeAgg.c:2432)
==00:00:00:04.149 3088935== by 0x41E55F: ExecAgg (nodeAgg.c:2161)
==00:00:00:04.149 3088935== by 0x406FBD: ExecProcNode (executor.h:259)
==00:00:00:04.149 3088935== by 0x406FBD: ExecutePlan (execMain.c:1636)

I traced the path of the memory contents (MinimalTuple-derived):
readtup_heap -> READTUP -> mergereadnext -> tuplesort_heap_insert ->
tuplesort_gettuple_common -> ExecStoreMinimalTuple ->
tts_minimal_store_tuple -> tts_minimal_copy_minimal_tuple -> ... ->
tts_minimal_get_minimal_tuple -> ExecFetchSlotMinimalTuple ->
hashagg_spill_tuple -> LogicalTapeWrite (memcpy) -> BufFileWrite (memcpy) ->
BufFileSeek -> BufFileFlush -> BufFileDumpBuffer -> FileWrite

and found that filling tuple->mt_padding:
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -4062,6 +4062,7 @@ readtup_heap(Tuplesortstate *state, SortTuple *stup,

/* read in the tuple proper */
tuple->t_len = tuplen;
+ memset(tuple->mt_padding, 0, MINIMAL_TUPLE_PADDING);
LogicalTapeReadExact(tape, tupbody, tupbodylen);
if (state->sortopt & TUPLESORT_RANDOMACCESS) /* need trailing
length

makes the error go away.
But a comment in src/include/access/htup_details.h says:
* MINIMAL_TUPLE_DATA_OFFSET is the offset to the first useful (non-pad)
data
* other than the length word. tuplesort.c and tuplestore.c use this to
avoid
* writing the padding to disk.

So it seems that the padding might still be written out if hash
aggregation
spills data to disk.
Maybe it's worth to add
VALGRIND_CHECK_MEM_IS_DEFINED(ptr, size);
into LogicalTapeWrite() to detect such issues earlier...
Or maybe to place that check in
ExecFetchSlotMinimalTuple/ExecStoreMinimalTuple
to catch uninitialised data in all minimal tuples (but for now it reveals
other uninitialised data manipulations even during 'make check').

Browse pgsql-bugs by date

  From Date Subject
Next Message Hans Buschmann 2023-03-14 18:12:02 AW: BUG #17842: Adding a qual to a working query gets bogus syntax error
Previous Message Tom Lane 2023-03-14 16:20:46 Re: BUG #17842: Adding a qual to a working query gets bogus syntax error