Re: The documentation for storage type 'plain' actually allows single byte header

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, suchithjn22(at)gmail(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: The documentation for storage type 'plain' actually allows single byte header
Date: 2023-10-21 01:48:05
Message-ID: ZTMt1XxPmZekFXBL@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Fri, Sep 29, 2023 at 06:45:52PM -0400, Tom Lane wrote:
> Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> writes:
> > On Fri, 2023-09-29 at 18:19 -0400, Bruce Momjian wrote:
> >> Where did we end with this? Is a doc patch the solution?
>
> > I don't think this went anywhere, and a doc patch is not the solution.
> > Tom has argued convincingly that single-byte headers are an effect of the TOAST
> > system, and that STORAGE PLAIN should disable all effects of TOAST.
>
> Well, that was the original idea: you could use STORAGE PLAIN if you
> had C code that wasn't yet toast-aware. However, given the lack of
> complaints, it seems there's no non-toast-aware code left anywhere.
> And that's not too surprising, because the evolutionary pressure to
> fix such code would be mighty strong, and a lot of time has passed.
>
> I'm now inclined to think that changing the docs is better than
> changing the code; we'd be more likely to create new problems than
> fix anything useful.
>
> I wonder though if there's really just one place claiming that
> that's how it works. A trawl through the code comments might
> be advisable.

[ Discussion moved to hackers, same subject. ]

Here is the original thread from pgsql-docs:

https://www.postgresql.org/message-id/flat/167336599095.2667301.15497893107226841625%40wrigleys.postgresql.org

The report is about single-byte headers being used for varlena values
with PLAIN storage.

Here is the reproducible report:

CREATE EXTENSION pageinspect;
CREATE TABLE test(a VARCHAR(10000) STORAGE PLAIN);
INSERT INTO test VALUES (repeat('A',10));

Now peek into the page with pageinspect functions

SELECT left(encode(t_data, 'hex'), 40) FROM
heap_page_items(get_raw_page('test', 0));

This returned value of "1741414141414141414141".
Here the first byte 0x17 = 0001 0111 in binary.
Length + 1 is stored in the length bits (1-7). So Len = 0001011-1 = (11-1)
[base-10] = 10 [base-10]
which exactly matches the expected length. Further the data "41" repeated 10
times also indicates character A (65 or 0x41 in ASCII) repeated 10 times.

I researched this and thought it would be a case where we were lacking a
check before creating a single-byte header, but I couldn't find anything
missing. I think the problem is that the _source_ tupleDesc attstorage
attribute is being used to decide if we should use a short header, while
it is really the storage type of the destination that we should be
checking. Unfortunately, I don't think the destination is accessible at
the location were we are deciding about a short header.

I am confused how to proceed. I feel we need to fully understand why
this happening before we adjust anything. Here is a backtrace --- the
short header is being created in fill_val() and the attstorage value
there is 'x'/EXTENDED.

---------------------------------------------------------------------------

#0 fill_val (att=0x56306f61dae8, bit=0x0, bitmask=0x7ffcfcfc1fb4, dataP=0x7ffcfcfc1f90, infomask=0x56306f61e25c, datum=94766026487048, isnull=false) at heaptuple.c:278
#1 0x000056306e7800eb in heap_fill_tuple (tupleDesc=0x56306f61dad0, values=0x56306f61dc20, isnull=0x56306f61dc28, data=0x56306f61e260 "", data_size=11, infomask=0x56306f61e25c, bit=0x0) at heaptuple.c:427
#2 0x000056306e781708 in heap_form_tuple (tupleDescriptor=0x56306f61dad0, values=0x56306f61dc20, isnull=0x56306f61dc28) at heaptuple.c:1181
#3 0x000056306ea13dcb in tts_virtual_copy_heap_tuple (slot=0x56306f61dbd8) at execTuples.c:280
#4 0x000056306ea1346e in ExecCopySlotHeapTuple (slot=0x56306f61dbd8) at ../../../src/include/executor/tuptable.h:463
#5 0x000056306ea14928 in tts_buffer_heap_copyslot (dstslot=0x56306f61e1a8, srcslot=0x56306f61dbd8) at execTuples.c:798
#6 0x000056306ea4342e in ExecCopySlot (dstslot=0x56306f61e1a8, srcslot=0x56306f61dbd8) at ../../../src/include/executor/tuptable.h:487
#7 0x000056306ea44785 in ExecGetInsertNewTuple (relinfo=0x56306f61d678, planSlot=0x56306f61dbd8) at nodeModifyTable.c:685
#8 0x000056306ea49123 in ExecModifyTable (pstate=0x56306f61d470) at nodeModifyTable.c:3789
#9 0x000056306ea0ef3c in ExecProcNodeFirst (node=0x56306f61d470) at execProcnode.c:464
#10 0x000056306ea03702 in ExecProcNode (node=0x56306f61d470) at ../../../src/include/executor/executor.h:273
#11 0x000056306ea05fe2 in ExecutePlan (estate=0x56306f61d228, planstate=0x56306f61d470, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x56306f588170, execute_once=true) at execMain.c:1670
#12 0x000056306ea03c63 in standard_ExecutorRun (queryDesc=0x56306f527888, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:365
#13 0x000056306ea03aee in ExecutorRun (queryDesc=0x56306f527888, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:309
#14 0x000056306ec70cf5 in ProcessQuery (plan=0x56306f588020, sourceText=0x56306f552a98 "INSERT INTO test VALUES (repeat('A',10));", params=0x0, queryEnv=0x0, dest=0x56306f588170, qc=0x7ffcfcfc25c0) at pquery.c:160
#15 0x000056306ec72514 in PortalRunMulti (portal=0x56306f5cccf8, isTopLevel=true, setHoldSnapshot=false, dest=0x56306f588170, altdest=0x56306f588170, qc=0x7ffcfcfc25c0) at pquery.c:1277
#16 0x000056306ec71b3a in PortalRun (portal=0x56306f5cccf8, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x56306f588170, altdest=0x56306f588170, qc=0x7ffcfcfc25c0) at pquery.c:791
#17 0x000056306ec6b465 in exec_simple_query (query_string=0x56306f552a98 "INSERT INTO test VALUES (repeat('A',10));") at postgres.c:1273
#18 0x000056306ec6fdd3 in PostgresMain (dbname=0x56306f58ab88 "test", username=0x56306f50ee68 "postgres") at postgres.c:4657
#19 0x000056306ebb304c in BackendRun (port=0x56306f57dc20) at postmaster.c:4423
#20 0x000056306ebb26fc in BackendStartup (port=0x56306f57dc20) at postmaster.c:4108
#21 0x000056306ebaf134 in ServerLoop () at postmaster.c:1767
#22 0x000056306ebaead5 in PostmasterMain (argc=1, argv=0x56306f50ce30) at postmaster.c:1466
#23 0x000056306ea8108c in main (argc=1, argv=0x56306f50ce30) at main.c:198

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Michael Paquier 2023-10-21 07:11:50 Re: Minipatch concerning tags
Previous Message Maxim Yablokov 2023-10-20 08:22:55 Minipatch concerning tags

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Wienhold 2023-10-21 02:02:21 Re: Fix output of zero privileges in psql
Previous Message Tom Lane 2023-10-21 01:45:28 Re: LLVM 16 (opaque pointers)