Re: Add proper planner support for ORDER BY / DISTINCT aggregates

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date: 2021-07-12 23:04:26
Message-ID: CAEudQAq-H2yNzCni1N_PW7PfrWr4jYMhzRAiQiGyAQ4vt4-sTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Em seg., 12 de jul. de 2021 às 09:04, David Rowley <dgrowleyml(at)gmail(dot)com>
escreveu:

> On Sun, 13 Jun 2021 at 03:07, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> >
> > Please find attached my WIP patch. It's WIP due to what I mentioned
> > in the above paragraph and also because I've not bothered to add JIT
> > support for the new expression evaluation steps.
>
> I've split this patch into two parts.
>
Hi, I'll take a look.

> 0001 Adds planner support for ORDER BY aggregates.
>
/* Normal transition function without ORDER BY / DISTINCT. */
Is it possible to avoid entering to initialize args if 'argno >=
pertrans->numTransInputs'?
Like this:

if (!pertrans->aggsortrequired && argno < pertrans->numTransInputs)

And if argos is '>' that pertrans->numTransInputs?
The test shouldn't be, inside the loop?

+ /*
+ * Don't initialize args for any ORDER BY clause that might
+ * exist in a presorted aggregate.
+ */
+ if (argno >= pertrans->numTransInputs)
+ break;

I think that or can reduce the scope of variable 'sortlist' or simply
remove it?

a)
+ /* Determine pathkeys for aggregate functions with an ORDER BY */
+ if (parse->groupingSets == NIL && root->numOrderedAggs > 0 &&
+ (qp_extra->groupClause == NIL || root->group_pathkeys))
+ {
+ ListCell *lc;
+ List *pathkeys = NIL;
+
+ foreach(lc, root->agginfos)
+ {
+ AggInfo *agginfo = (AggInfo *) lfirst(lc);
+ Aggref *aggref = agginfo->representative_aggref;
+ List *sortlist;
+

or better

b)
+ /* Determine pathkeys for aggregate functions with an ORDER BY */
+ if (parse->groupingSets == NIL && root->numOrderedAggs > 0 &&
+ (qp_extra->groupClause == NIL || root->group_pathkeys))
+ {
+ ListCell *lc;
+ List *pathkeys = NIL;
+
+ foreach(lc, root->agginfos)
+ {
+ AggInfo *agginfo = (AggInfo *) lfirst(lc);
+ Aggref *aggref = agginfo->representative_aggref;
+
+ if (AGGKIND_IS_ORDERED_SET(aggref->aggkind))
+ continue;
+
+ /* DISTINCT aggregates not yet supported by the planner */
+ if (aggref->aggdistinct != NIL)
+ continue;
+
+ if (aggref->aggorder == NIL)
+ continue;
+
+ /*
+ * Find the pathkeys with the most sorted derivative of the first
+ * Aggref. For example, if we determine the pathkeys for the first
+ * Aggref to be {a}, and we find another with {a,b}, then we use
+ * {a,b} since it's useful for more Aggrefs than just {a}. We
+ * currently ignore anything that might have a longer list of
+ * pathkeys than the first Aggref if it is not contained in the
+ * pathkeys for the first agg. We can't practically plan for all
+ * orders of each Aggref, so this seems like the best compromise.
+ */
+ if (pathkeys == NIL)
+ {
+ pathkeys = make_pathkeys_for_sortclauses(root, aggref->aggorder ,
+ aggref->args);
+ aggref->aggpresorted = true;
+ }
+ else
+ {
+ List *pathkeys2 = make_pathkeys_for_sortclauses(root,
+ aggref->aggorder,
+ aggref->args);

> 0002 is a WIP patch for DISTINCT support. This still lacks JIT
> support and I'm still not certain of the best where to store the
> previous value or tuple to determine if the current one is distinct
> from it.
>
In the patch 0002, I think that can reduce the scope of variable 'aggstate'?

+ EEO_CASE(EEOP_AGG_PRESORTED_DISTINCT_SINGLE)
+ {
+ AggStatePerTrans pertrans = op->d.agg_presorted_distinctcheck.pertrans;
+ Datum value = pertrans->transfn_fcinfo->args[1].value;
+ bool isnull = pertrans->transfn_fcinfo->args[1].isnull;
+
+ if (!pertrans->haslast ||
+ pertrans->lastisnull != isnull ||
+ !DatumGetBool(FunctionCall2Coll(&pertrans->equalfnOne,
+ pertrans->aggCollation,
+ pertrans->lastdatum, value)))
+ {
+ if (pertrans->haslast && !pertrans->inputtypeByVal)
+ pfree(DatumGetPointer(pertrans->lastdatum));
+
+ pertrans->haslast = true;
+ if (!isnull)
+ {
+ AggState *aggstate = castNode(AggState, state->parent);
+
+ /*
+ * XXX is it worth having a dedicted ByVal version of this
+ * operation so that we can skip switching memory contexts
+ * and do a simple assign rather than datumCopy below?
+ */
+ MemoryContext oldContext;
+
+ oldContext =
MemoryContextSwitchTo(aggstate->curaggcontext->ecxt_per_tuple_memory);

What do you think?

regards,
Ranier Vilela

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-07-12 23:16:35 Re: proposal: possibility to read dumped table's name from file
Previous Message Josef Šimánek 2021-07-12 23:02:19 Re: [PATCH] Don't block HOT update by BRIN index