Re: ANY_VALUE aggregate

From: Vik Fearing <vik(at)postgresfriends(dot)org>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: ANY_VALUE aggregate
Date: 2022-12-08 12:32:30
Message-ID: e25c63af-bea3-87ab-f93c-f28187dde318@postgresfriends.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/8/22 06:48, David G. Johnston wrote:
> On Wed, Dec 7, 2022 at 10:00 PM Vik Fearing <vik(at)postgresfriends(dot)org> wrote:
>
>> On 12/7/22 04:22, David G. Johnston wrote:
>>> On Mon, Dec 5, 2022 at 10:40 PM Vik Fearing <vik(at)postgresfriends(dot)org>
>> wrote:
>>>
>>>> On 12/6/22 05:57, David G. Johnston wrote:
>>>>> On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing <vik(at)postgresfriends(dot)org>
>>>> wrote:
>>>>>
>>>>>> I can imagine an optimization that would remove an ORDER BY clause
>>>>>> because it isn't needed for any other aggregate.
>>>>>
>>>>>
>>>>> I'm referring to the query:
>>>>>
>>>>> select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
>>>>> // produces 1, per the documented implementation-defined behavior.
>>>>
>>>> Implementation-dependent. It is NOT implementation-defined, per spec.
>>>
>>> I really don't care all that much about the spec here given that ORDER BY
>>> in an aggregate call is non-spec.
>>
>>
>> Well, this is demonstrably wrong.
>>
>> <array aggregate function> ::=
>> ARRAY_AGG <left paren>
>> <value expression>
>> [ ORDER BY <sort specification list> ]
>> <right paren>
>>
>
> Demoable only by you and a few others...

The standard is publicly available. It is strange that we, being so
open, hold ourselves to such a closed standard; but that is what we do.

> We should update our documentation - the source of SQL Standard knowledge
> for mere mortals.
>
> https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES
>
> "Note: The ability to specify both DISTINCT and ORDER BY in an aggregate
> function is a PostgreSQL extension."
>
> Apparently only DISTINCT remains as our extension.

Using DISTINCT in an aggregate is also standard. What that note is
saying is that the standard does not allow *both* to be used at the same
time.

The standard defines these things for specific aggregates whereas we are
much more generic about it and therefore have to deal with the combinations.

I have submitted a doc patch to clarify that.

>>> You are de-facto creating a first_value aggregate (which is by definition
>>> non-standard) whether you like it or not.
>>
>>
>> I am de jure creating an any_value aggregate (which is by definition
>> standard) whether you like it or not.
>>
>
> Yes, both statements seem true. At least until we decide to start ignoring
> a user's explicit order by clause.

I ran some tests and including an ORDER BY in an aggregate that doesn't
care (like COUNT) is devastating for performance. I will be proposing a
solution to that soon and I invite you to participate in that
conversation when I do.
--
Vik Fearing

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2022-12-08 13:15:52 Re: Creating HeapTuple from char and date values
Previous Message Andrew Dunstan 2022-12-08 12:20:28 Re: fix and document CLUSTER privileges