From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Reduce TupleHashEntryData struct size by half |
Date: | 2024-11-18 10:13:50 |
Message-ID: | 0f3e7407-fb19-42c0-ad10-46eace6c1399@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 18/11/2024 02:01, Jeff Davis wrote:
>
> TupleHashEntryData is the size of a hash bucket, and it's currently 24
> bytes. The size of the structure is important for HashAgg and other
> callers that can build up large hashtables of small tuples. Waste in
> this structure is even worse when the hash table is sparse, because the
> space is consumed whether the bucket is occupied or not.
>
> The attached patch series brings it down to 12. There are several
> unappealing hacks, so ideally we'd find something cleaner, but halving
> the bucket size seems quite desirable if it can be done in an
> acceptable way.
Makes sense.
> 0001: This patch makes the structure private, so that it's easier to
> control the way the structure is accessed.
Seems pretty uncontroversial. You removed the typedef for struct
TupleHashEntryData, which is a bit unusual for our usual source style.
Was there a reason for that?
> 0002: Removes the "additional" pointer, instead storing it right after
> the tuple, which is already stored in a separate chunk. Hack: this
> crams the "additional" pointer after the MinimalTuple in the same chunk
> of memory to avoid adding additional palloc()s.
Hmm, it would seem more straightforward to store it in the beginning,
i.e. have something like this:
struct {
void *additional;
MinimalTupleData mtup;
} ;
Come to think of it, how important is it that we use MinimalTuple here
at all? Some other representation could be faster to deal with in
TupleHashTableMatch() anyway.
> 0003: simplehash: allow the caller to decide which entries are empty vs
> in-use rather than requiring a separate "status" field. This may limit
> other possible status values in the future (i.e. adding on to the
> enum), but I'm not sure what those other stutus values might be.
+1. I've wanted to have this in the past.
> 0004: Removes the "status" field from TupleHashEntryData, using
> firstTuple==NULL to mean "empty", otherwise "in use". Hack: need an
> additional "special" pointer value to mean "input slot" now that NULL
> means "empty".
+1
> 0005: Pack TupleHashEntryData. IIUC, this is fine even on machines that
> can't do unaligned access, so long as we are accessing the fields
> through the struct, and not taking the address of individual members.
Seems OK.
I wonder if the compiler understands that the elements are still 4-byte
aligned, or if it forces byte-per-byte access? Playing with godbolt a
little, it seems like GCC at least understands it, but clang does not.
On architectures with non-strict alignment, it doesn't matter as a
simple load/store instruction is the fastest option anyway.
--
Heikki Linnakangas
Neon (https://neon.tech)
From | Date | Subject | |
---|---|---|---|
Next Message | Shubham Khanna | 2024-11-18 10:17:03 | Re: Improve the error message for logical replication of regular column to generated column. |
Previous Message | wenhui qiu | 2024-11-18 10:00:59 | Auto Vacuum optimisation |