Re: Ordering behavior for aggregates

From: Vik Fearing <vik(at)postgresfriends(dot)org>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Subject: Re: Ordering behavior for aggregates
Date: 2022-12-13 13:05:10
Message-ID: abb988eb-6a88-4b63-4703-36f9a7b97133@postgresfriends.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/13/22 13:55, Magnus Hagander wrote:
> On Tue, Dec 13, 2022 at 1:51 PM Vik Fearing <vik(at)postgresfriends(dot)org> wrote:
>
>> The standard only defines an ORDER BY clause inside of an aggregate for
>> ARRAY_AGG(). As an extension to the standard, we allow it for all
>> aggregates, which is very convenient for non-standard things like
>> string_agg().
>>
>> However, it is completely useless for things like AVG() or SUM(). If
>> you include it, the aggregate will do the sort even though it is neither
>> required nor desired.
>>
>> I am proposing something like pg_aggregate.aggordering which would be an
>> enum of behaviors such as f=Forbidden, a=Allowed, r=Required. Currently
>> all aggregates would have 'a' but I am thinking that a lot of them could
>> be switched to 'f'. In that case, if a user supplies an ordering, an
>> error is raised.
>>
>
> Should there perhaps also be an option for "ignored" where we'd allow the
> user to specify it, but not actually do the sort because we know it's
> pointless? Or maybe that should be the behaviour of "forbidden", which
> should then perhaps have a different name?

I did think about that but I can't think of any reason we would want to
silently ignore something the user has written. If the ordering doesn't
make sense, we should forbid it.

> My main motivation behind this is to be able to optimize aggregates that
>> could stop early such as ANY_VALUE(), but also to self-optimize queries
>> written in error (or ignorance).
>>
>> There is recurring demand for a first_agg() of some sort, and that one
>> (whether implemented in core or left to extensions) would use 'r' so
>> that an error is raised if the user does not supply an ordering.
>>
>> I have not started working on this because I envision quite a lot of
>> bikeshedding, but this is the approach I am aiming for.
>>
>> Thoughts?
>>
>
> For consistency, should we have a similar flag for DISITNCT? That could be
> interesting to forbid for something like first_agg() wouldn't it? I'm not
> sure what the usecase would be to require it, but maybe there is one?

I thought about that too, but decided it could be a separate patch
because far fewer aggregates would need it.
--
Vik Fearing

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2022-12-13 13:07:06 RE: Perform streaming logical transactions by background workers and parallel apply
Previous Message Magnus Hagander 2022-12-13 12:55:27 Re: Ordering behavior for aggregates