Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
Date: 2015-12-24 01:09:58
Message-ID: CAM3SWZQyWTnhkU2rAO5C_L7hrpRRabmvN2JEC6Lswd=UVMvveA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 22, 2015 at 10:49 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Tue, Dec 22, 2015 at 9:31 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> It has some nice properties, because unsigned integers are so simple
>>> and flexible. For example, we can do it with int4 sometimes, and then
>>> compare two attributes at once on 64-bit platforms. Maybe the second
>>> attribute (the second set of 4 bytes) isn't an int4, though -- it's
>>> any other type's abbreviated key (which can be assumed to have a
>>> compatible representation). That's one more advanced possibility.
>>
>> Yikes.
>
> I'm not suggesting we'd want to do that immediately, or even at all.
> My point is -- stuff like this becomes possible. My intuition is that
> it might come up in a bunch of places.

I had a better idea, that is based on the same intuition that we can
leverage the flexibility of that standardized representation of
abbreviated keys to the hilt.

We use one bit to represent whether this is the first run, or a
subsequent run (in a world where replacement selection is only used
for "quicksort with spillover", and run number isn't that
interesting), and another bit to represent NULLness. These are the
most significant and second most significant bits, respectively. Fill
all remaining bits with your unsigned abbreviated key representation,
without regard to the underlying details of the type. Now, you just
found a way to cut the size of SortTuples significantly, to just two
pointers (whereas with alignment considerations, SortTuples currently
consume enough memory to store 3 pointers). You also just found a way
of significantly cutting the number of instructions that are needed
for the hot code path, because you don't really need an
abbreviation-based ApplySortComparator() inline function to interpret
things through -- whether or not NULLs are first or last, and how
NULLs are considered generally is all baked into the abbreviated
representation from the start.

This is certainly speculative stuff, and having multiple versions of
SortTuple would be inconvenient, but the point again is that there are
considerable upsides to a standardized representation (abbreviated
keys always being compared as unsigned integers), and no downsides.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2015-12-24 01:28:59 Re: Combining Aggregates
Previous Message Tomas Vondra 2015-12-24 01:08:23 Re: Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?