Re: Wrong results with grouping sets

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Paul George <p(dot)a(dot)george19(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wrong results with grouping sets
Date: 2024-07-18 00:31:18
Message-ID: CAMbWs4_xhRAfy-i==nFMZGukw4M=OnkfwpfEfiGmAx6a3SYBKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 17, 2024 at 8:50 AM Paul George <p(dot)a(dot)george19(at)gmail(dot)com> wrote:
> > Since a subquery is a volatile expression, each of its instances
> should be evaluated separately.

I don't think this conclusion is correct. Look at:

select random(), random() from t group by random();
random | random
--------------------+--------------------
0.7972330769936766 | 0.7972330769936766
(1 row)

> This seems like a valid point, though "query 2" below which groups over a RANDOM() column and outputs an additional RANDOM() column a potential, albeit contrived, counter-example? [NOTE: this was done on Postgres 16.3] I've included a few different combinations of GROUP BYs.

Interesting. I looked into the scenarios with multiple instances of
the same volatile grouping expressions and here is what I observed.

create table t (a int, b int);
insert into t select 1,1;

-- on master, with plain volatile functions
select random() as c1,
random() as c2,
random() as c3
from t t1 group by c1;
c1 | c2 | c3
-------------------+-------------------+-------------------
0.567478050404431 | 0.567478050404431 | 0.567478050404431
(1 row)

So the random() function is evaluated only once, even though it
appears three times.

-- on master, with subqueries that are 'volatile'
select (select random() from t t2 where a = t1.a) as c1,
(select random() from t t2 where a = t1.a) as c2,
(select random() from t t2 where a = t1.a) as c3
from t t1 group by c1;
c1 | c2 | c3
--------------------+--------------------+--------------------
0.8420177313766823 | 0.2969648209746336 | 0.3499675329093421
(1 row)

So on master the subquery is evaluated three times. Why isn't this
consistent with the behavior of the first query?

-- on patched, with subqueries that are 'volatile'
select (select random() from t t2 where a = t1.a) as c1,
(select random() from t t2 where a = t1.a) as c2,
(select random() from t t2 where a = t1.a) as c3
from t t1 group by c1;
c1 | c2 | c3
--------------------+--------------------+--------------------
0.5203586066423254 | 0.5203586066423254 | 0.5203586066423254
(1 row)

So on patched the subquery is evaluated only once, which is consistent
with the behavior of the first query.

Does this suggest that the patched version is more 'correct' for this
case?

Now let's look at the scenario with two grouping keys.

-- on master, with plain volatile functions
select random() as c1,
random() as c2,
random() as c3
from t t1 group by c1, c2;
c1 | c2 | c3
--------------------+--------------------+--------------------
0.9388558105069595 | 0.2900389441597979 | 0.9388558105069595
(1 row)

So the first two random() functions are evaluated independently, and
the third random() function references the result of the first one.

-- on master, with subqueries that are 'volatile'
select (select random() from t t2 where a = t1.a) as c1,
(select random() from t t2 where a = t1.a) as c2,
(select random() from t t2 where a = t1.a) as c3
from t t1 group by c1, c2;
c1 | c2 | c3
---------------------+--------------------+--------------------
0.46275163300894073 | 0.5083760995112951 | 0.6752682696191123
(1 row)

So on master the subquery is evaluated three times.

-- on patched, with subqueries that are 'volatile'
select (select random() from t t2 where a = t1.a) as c1,
(select random() from t t2 where a = t1.a) as c2,
(select random() from t t2 where a = t1.a) as c3
from t t1 group by c1, c2;
c1 | c2 | c3
--------------------+--------------------+--------------------
0.9887848690744176 | 0.9887848690744176 | 0.9887848690744176
(1 row)

So on patched the subquery is evaluated only once.

It seems that in this scenario, neither the master nor the patched
version handles volatile subqueries in grouping expressions the same
way as it handles plain volatile functions.

I am confused. Does the SQL standard explicitly define or standardize
the behavior of grouping by volatile expressions? Does anyone know
about that?

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joseph Koshakow 2024-07-18 00:31:40 Re: Remove dependence on integer wrapping
Previous Message Peter Smith 2024-07-18 00:09:05 Re: Slow catchup of 2PC (twophase) transactions on replica in LR