Re: behavior of GROUP BY with VOLATILE expressions

From: Paul George <p(dot)a(dot)george19(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject: Re: behavior of GROUP BY with VOLATILE expressions
Date: 2024-07-19 23:32:22
Message-ID: CALA8mJq9sJKw3p=gKf40D-8M43VJzBHN0mus1zo55ASfGn02tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David:

>Only now just grasping that you are trying to group something that is
definitionally random. That just doesn't make sense to me.

Oh, sorry for the confusion. Yeah, totally. I didn't mean to draw specific
attention to GROUP BY -- as you've pointed out elsewhere this issue also
exists with ORDER BY.

To clean this up a bit, it's specifically the comparison of how volatile
functions and expressions are evaluated differently here (covered in prior
links you've provided),

postgres=# select random(), random() order by random();
random | random
-------------------+-------------------
0.956989895473876 | 0.956989895473876
(1 row)

and, here,

postgres=# select (select random()), (select random()) order by (select
random());
random | random
--------------------+--------------------
0.2872914386383745 | 0.8976525075618966
(1 row)

Regarding documentation, I think those changes would be useful. There's
this suggestion

"An expression or subexpression in
the SELECT list that matches an ORDER BY or GROUP BY item is taken to represent
the same value that was sorted or grouped by, even when the
(sub)expression is volatile".

and this one,

"A side-effect of this feature is that ORDER BY expressions containing
volatile functions will execute the volatile function only once for the
entire row; thus any column expressions using the same function will reuse
the same function result."

But I don't think either cover the additional, albeit nuanced, case of
volatile scalar subqueries.

-Paul-

On Fri, Jul 19, 2024 at 2:28 PM David G. Johnston <
david(dot)g(dot)johnston(at)gmail(dot)com> wrote:

> On Fri, Jul 19, 2024 at 2:21 PM Paul George <p(dot)a(dot)george19(at)gmail(dot)com>
> wrote:
>
>> Great, thanks for the links and useful past discussions! I figured I
>> wasn't the first to stumble across this, and it's interesting to see the
>> issue arise with ORDER BY [VOLATILE FUNC] as well.
>>
>> My question was not so much about changing behavior as it was about
>> understanding what is desired, especially in light of the fact that
>> subqueries behave differently. From my reading of the links you provided,
>> it seems that even the notion of "desired" here is itself dubious and that
>> there is a case for reevaluating RANDOM() everywhere and a case for not
>> doing that. Given this murkiness, is it fair then to say that drawing
>> parallels between how GROUP BY subquery is handled is moot?
>>
>
> Only now just grasping that you are trying to group something that is
> definitionally random. That just doesn't make sense to me. Grouping is
> for categorical data (loosely defined, something like Invoice# arguably
> counts as a category if you are looking at invoice details.)
>
> I'll stick with: this whole area, implementation-wise, is going to remain
> status-quo. If you've got ideas for documenting it better hopefully a
> patch goes in at some point. Mostly that can be done black-box style -
> inputs and outputs, not code reading.
>
> David J.
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2024-07-20 00:47:17 Re: documentation structure
Previous Message Joseph Koshakow 2024-07-19 23:32:18 Re: Remove dependence on integer wrapping