Re: Error when using array_agg with filter where clause in pg16 and pg17

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kaimeh <kkaimeh(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Error when using array_agg with filter where clause in pg16 and pg17
Date: 2025-04-09 01:50:48
Message-ID: CAApHDvqi79jXFBeaOSDok8psEkTpzgHzaKje6VCyKVtj=Ss1_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

(I forgot to reply to this part)

On Wed, 9 Apr 2025 at 12:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> > The only way I can see to fix that properly is to cost it in during
> > aggregate planning. IIRC, there's no costing for the implicit sorts in
> > Aggref. We could add some of those and put a flag in AggPath which
> > gets propagated to Agg to specify if aggpresorted should be ignored or
> > not for the given Agg node. We'd have to add_path() for both versions
> > of the AggPath and let the cheapest Path win.
>
> Yeah, AFAIR we never did any real costing of aggregate-internal
> sorting. However, adding that would pose the same risk you mentioned
> that some queries might regress due to picking the worse plan.

The difference is that by the time we start generating AggPaths, we
have all the information we need to determine the selectivity of the
aggfilter and apply a sort cost to rows that survive that. So, with
the method I suggest, any poor plan choice is down to bad costing or
stats, whereas if we just disable the optimisation when the Aggref has
a FILTER, as you propose, then we'll always fallback on nodeAgg.c
doing the sorting, even for FILTERs that barely filter anything or
when there's a perfectly good index to give us presorted input.

Just to be clear, the idea I'm proposing for v19 is that we modify
cost_agg() adding a new bool parameter and have it add costs for the
implicit sorts for each Aggref that has an aggdistinct or aggorderby.
The bool parameter would control if aggpresorted Aggrefs were included
for those costs or ignored. We'd then create two AggPaths, one which
would take advantage of presorting and uses a properly sorted input
path and another that ignores the aggpresorted flag and uses the
cheapest input path. add_path() then decides which of those is better.

I'm not following why my in method if the planner chooses a poor plan
is any different from the planner choosing a poor plan for anything
else because the stats or costs aren't a good reflection of reality.

David

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2025-04-09 02:36:26 Re: Error when using array_agg with filter where clause in pg16 and pg17
Previous Message David Rowley 2025-04-09 00:54:50 Re: Error when using array_agg with filter where clause in pg16 and pg17