From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Reduce TupleHashEntryData struct size by half |
Date: | 2025-02-13 01:01:29 |
Message-ID: | ed493e174e269f981b4a0b3109e3dc6db647e06f.camel@j-davis.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 2025-02-07 at 17:13 -0800, Jeff Davis wrote:
> On Sun, 2025-01-12 at 14:54 +1300, David Rowley wrote:
> > I wonder if there's some other better way of doing this. Would it
> > be
> > worth having some function like ExecCopySlotMinimalTuple() that
> > accepts an additional parameter so that the palloc allocates N more
> > bytes at the end?
>
> Attached a new series that implements this idea.
Attached a new patchset that doesn't change the API for
ExecCopySlotMinimalTuple(). Instead, I'm using
ExecFetchSlotMinimalTuple(), which avoids the extra memcpy if the slot
is TTSOpsMinimalTuple, which is what HashAgg uses. I separated out the
change to use ExecFetchSlotMinimalTuple() into 0004 to illustrate the
performance impact.
DATA:
create table a (a text not null);
insert into a select repeat(md5(a::text),10)
from generate_Series(1,1000) a;
vacuum freeze analyze a;
create table b (b text not null);
insert into b select repeat(md5(b::text),10)
from generate_Series(1001,2000) b;
vacuum freeze analyze b;
QUERY: select a,count(*) from a group by a;
master: tps = 2054.742658 (without initial connection time)
0001: tps = 2068.620429 (without initial connection time)
0002: tps = 2027.046422 (without initial connection time)
0003: tps = 1951.392904 (without initial connection time)
0004: tps = 2071.690037 (without initial connection time)
QUERY: select * from a except select * from b;
master: tps = 1720.168862 (without initial connection time)
0001: tps = 1703.040810 (without initial connection time)
0002: tps = 1687.191715 (without initial connection time)
0003: tps = 1579.975535 (without initial connection time)
0004: tps = 1616.741447 (without initial connection time)
So the GROUP BY query has no regression (because there's no additional
copy) and the EXCEPT query has about a 6% regression.
The v6 series doesn't have that regression for set operations, but it
requires the somewhat-messy ExecCopySlotMinimalTupleExtra() API. If you
think you'll use that in nodeMemoize, then the API change is worth it.
If not, it feels a bit like a hack.
Another thing I did in the v7 series is I stored the additional data
before the firstTuple. It seems cleaner to store the fixed-length data
first -- I could do the same thing if we go with
ExecCopySlotMinimalTupleExtra().
In any case, it seems like we have agreement to switch to the Bump
context, so I'll do another round of tests to see if there are any
downsides, then clean it up and commit v7-0001.
Regards,
Jeff Davis
Attachment | Content-Type | Size |
---|---|---|
v7-0001-HashAgg-use-Bump-allocator-for-hash-TupleHashTabl.patch | text/x-patch | 10.9 KB |
v7-0002-Refactor-accessors-for-TupleHashEntryData.patch | text/x-patch | 11.8 KB |
v7-0003-Combine-firstTuple-and-additionalsize-into-a-sing.patch | text/x-patch | 4.1 KB |
v7-0004-Use-ExecFetchSlotMinimalTuple.patch | text/x-patch | 1.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | ahmedashour | 2025-02-13 01:13:57 | [PATCH] Optimize SP-GiST text leaf comparisons with memcmp |
Previous Message | Melanie Plageman | 2025-02-13 00:40:30 | Re: BitmapHeapScan streaming read user and prelim refactoring |