Re: Make tuple deformation faster

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Victor Yegorov <vyegorov(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Make tuple deformation faster
Date: 2024-12-04 12:42:36
Message-ID: CAApHDvq=GXX+f75AqSGX4JGvtCAXydsOWzNdOSnQNjxwtnG-Aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 3 Dec 2024 at 16:54, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> After making some last-minute cosmetic adjustments, I've just pushed
> the 0001 patch.

So that commit broke all of the debug_parallel_query = regress
buildfarm animals. This was due to how I'd changed the
TupleDescData.attrs array to a pointer which I was trying to keep set
to point to the FormData_pg_attribute array which came after the
variable length CompactAttribute array in the TupleDesc. That turned
out to be hard to maintain as in typcache.c, we can get TupleDescs
from shared memory. When that happened, I was adjusting the ->attrs
pointer to the local process's address after the call to
dsa_get_address(), but I didn't consider the fact that that was
changing the shared TupleDesc :-(. Other processes (of course) didn't
like that, that's why the regression tests failed with
debug_parallel_query = regress. The bad attrs address caused issues
during lookups of the RecordCacheHash hash table.

The following was enough to break it:

set debug_parallel_query = regress;
SELECT pg_input_is_valid('34', 'int2');
SELECT pg_input_is_valid('asdf', 'int2');
SELECT pg_input_is_valid('50000', 'int2');
SELECT * FROM pg_input_error_info('50000', 'int2');

-- While we're here, check int2vector as well
SELECT pg_input_is_valid(' 1 3 5 ', 'int2vector');
SELECT * FROM pg_input_error_info('1 asdf', 'int2vector');

In the attached v7-0001 patch, I've now got rid of the
TupleDescData->attrs field. Figuring out the base address of the
FormData_pg_attribute array is now done by the TupleDescAttr() inline
function (this used to be a macro). Because that function is called
often in a tight loop, I wanted to ensure compilers wouldn't calculate
the base address of the array on each iteration of the loop. Looking
at [1], it seems gcc and clang are smart about this and calculate the
base address before the loop and add sizeof(FormData_pg_attribute) to
the register that's being used for the element pointer.

Changing this then caused some other issues... The GIST code was doing
the following to get a TupleDesc without the INCLUDE columns of the
index:

giststate->nonLeafTupdesc = CreateTupleDescCopyConstr(index->rd_att);
giststate->nonLeafTupdesc->natts =
IndexRelationGetNumberOfKeyAttributes(index);

Since I'm calculating the base address of the FormData_pg_attribute
array in TupleDesc by looking at natts, when this code changes natts
on the fly, that means calls to TupleDescAttr end up looking in the
wrong place for the required FormData_pg_attribute element. To fix
this I invented CreateTupleDescTruncatedCopy() and used it in all the
places that were fiddling with the natts field. I also didn't see any
reason to copy the constraints for the truncated TupleDesc. Indexes
can't have constraints.

The attached v7-0001 patch is the previous patch adjusted as per
above. v7-0002 is not for commit. That's me cheekily exploiting the
CFBot's CI testing machines to run in debug_parallel_query = regress
mode.

David

[1] https://godbolt.org/z/dvnGfnqxz

Attachment Content-Type Size
v7-0001-Introduce-CompactAttribute-array-in-TupleDesc.patch application/octet-stream 32.9 KB
v7-0002-Only-for-testing.patch application/octet-stream 1.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anthonin Bonnefoy 2024-12-04 12:52:05 Re: Add Pipelining support in psql
Previous Message Yugo Nagata 2024-12-04 12:08:26 Re: Doc: clarify the log message level of the VERBOSE option