Re: Expand applicability of aggregate's sortop optimization

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>, ilmari(at)ilmari(dot)org
Subject: Re: Expand applicability of aggregate's sortop optimization
Date: 2024-07-18 12:49:15
Message-ID: CAEze2WhZkbupd35vB8cUJsyNhED-Rd9isvRGjskZ+MO14fo8xA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 17 Jul 2024 at 16:09, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>
> On 17/7/2024 16:33, Matthias van de Meent wrote:
> > On Wed, 17 Jul 2024 at 05:29, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> >> Thanks for the job! I guess you could be more brave and push down also
> >> FILTER statement.
> >
> > While probably not impossible, I wasn't planning on changing this code
> > with new optimizations; just expanding the applicability of the
> > current optimizations.
> Got it>> As I see, the code:
> >> aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
> >> if (!OidIsValid(aggsortop))
> >> return false; /* not a MIN/MAX aggregate */
> >>
> >> used twice and can be evaluated earlier to avoid duplicated code.
> >
> > The code is structured like this to make sure we only start accessing
> > catalogs once we know that all other reasons to bail out from this
> > optimization indicate we can apply the opimization. You'll notice that
> > I've tried to put the cheapest checks that only use caller-supplied
> > information first, and catalog accesses only after that.
> >
> > If the fetch_agg_sort_op clause would be deduplicated, it would either
> > increase code complexity to handle both aggref->aggorder paths, or it
> > would increase the cost of planning MAX(a ORDER BY b) because of the
> > newly added catalog access.
> IMO it looks like a micro optimisation. But I agree, it is more about
> code style - let the committer decide what is better.>> Also, I'm unsure
> about the necessity of looking through the btree
> >> classes. Maybe just to check the commutator to the sortop, like in the
> >> diff attached? Or could you provide an example to support your approach?
> >
> > I think it could work, but I'd be hesitant to rely on that, as
> > commutator registration is optional (useful, but never required for
> > btree operator classes' operators). Looking at the btree operator
> > class, which is the definition of sortability in PostgreSQL, seems
> > more suitable and correct.
> Hm, I dubious about that. Can you provide an example which my variant
> will not pass but your does that correctly?

Here is one:

"""
CREATE OPERATOR @@> (
function=int4gt, leftarg=int4, rightarg=int4
); CREATE OPERATOR @@>= (
function=int4ge, leftarg=int4, rightarg=int4
); CREATE OPERATOR @@= (
function=int4eq, leftarg=int4, rightarg=int4
); CREATE OPERATOR @@<= (
function=int4le, leftarg=int4, rightarg=int4
); CREATE OPERATOR @@< (
function=int4lt, leftarg=int4, rightarg=int4
);

CREATE OPERATOR CLASS my_int_ops
FOR TYPE int
USING btree AS
OPERATOR 1 @<@,
OPERATOR 2 @<=@,
OPERATOR 3 @=@,
OPERATOR 4 @>=@,
OPERATOR 5 @>@,
FUNCTION 1 btint4cmp;

CREATE AGGREGATE my_int_max (
BASETYPE = int4,
SFUNC = int4larger,
STYPE = int4,
SORTOP = @>@
);

CREATE TABLE my_table AS
SELECT id::int4 FROM generate_series(1, 10000) id;

CREATE INDEX ON my_table (id my_int_ops);

SELECT my_int_max(id ORDER BY id USING @<@ ) from my_table;
"""

Because the @<@ and @>@ operators are not registered as commutative,
it couldn't apply the optimization in your patch, while the btree
operator check does allow it to apply the optimization.

Aside: Arguably, checking for commutator operators would not be
incorrect when looking at it from "applied operators" point of view,
but if that commutative operator isn't registered as opposite ordering
of the same btree opclass, then we'd probably break some assumptions
of some aggregate's sortop - it could be registered with another
opclass, and that could cause us to select a different btree opclass
(thus: ordering) than is indicated to be required by the aggregate;
the thing we're trying to protect against here.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2024-07-18 12:55:23 Re: CI, macports, darwin version problems
Previous Message Joe Conway 2024-07-18 12:00:56 Re: CI, macports, darwin version problems