Re: Reduce TupleHashEntryData struct size by half

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reduce TupleHashEntryData struct size by half
Date: 2024-11-18 20:22:28
Message-ID: 7c3224111d9edcc16e8e3c8242de5a82c4ee2a73.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2024-11-18 at 12:13 +0200, Heikki Linnakangas wrote:
> Seems pretty uncontroversial. You removed the typedef for struct
> TupleHashEntryData, which is a bit unusual for our usual source
> style.
> Was there a reason for that?

If it's private to a file and I don't intend to use it a lot, I do it
to cut down typedefs.list bloat. I'll go ahead and add the typedef back
to match the style, though.

>
> Hmm, it would seem more straightforward to store it in the beginning,
> i.e. have something like this:
>
> struct {
>       void *additional;
>       MinimalTupleData mtup;
> } ;

That was my first approach, but it requires an additional memcpy,
because ExecCopySlotMinimalTuple() does it's own palloc. I used
repalloc() because it will often have space at the end of the chunk
anyway, and not need to memcpy(). Maybe that's not significant but it
did seem detectable in some perf tests.

But perhaps we can go further and get rid of the "additional" pointer
and inline the pergroup data and the grouping key tuple into the same
palloc chunk? That would cut out a palloc and the 8 wasted bytes on the
pointer.

> Come to think of it, how important is it that we use MinimalTuple
> here
> at all? Some other representation could be faster to deal with in
> TupleHashTableMatch() anyway.

What did you have in mind? That sounds like a good idea orthogonal to
reducing the bucket size.

Alternatively, MinimalTuple is not very "minimal", and perhaps we can
just make it better.

>
> > 0004: Removes the "status" field from TupleHashEntryData, using
> > firstTuple==NULL to mean "empty", otherwise "in use". Hack: need an
> > additional "special" pointer value to mean "input slot" now that
> > NULL
> > means "empty".
>
> +1

For the FIRSTTUPLE_INPUTSLOT marker, do you think it's cleaner to use
what I did:

const static MinimalTuple FIRSTTUPLE_INPUTSLOT = (MinimalTuple) 0x1;

or something like:

static MinimalTupleData dummy = {0};
const static MinimalTuple FIRSTTUPLE_INPUTSLOT = &dummy;

?

>
> On architectures with non-strict alignment, it doesn't matter as a
> simple load/store instruction is the fastest option anyway.

My intuition is that the cost of dereferencing that pointer (to memory
which is not expected to be in cache) is going to be way higher than
the cost of a couple extra instructions to do the unaligned access.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2024-11-18 20:32:30 Re: Statistics Import and Export
Previous Message Torsten Förtsch 2024-11-18 20:21:56 PGSERVICEFILE as part of a normal connection string