Re: Wrong results with grouping sets

From: Paul George <p(dot)a(dot)george19(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wrong results with grouping sets
Date: 2024-07-17 00:50:15
Message-ID: CALA8mJqNeiexX3t_G555_m=nOMS9261zQ_W8r+6LRRbt3q8CLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the work!

> Since a subquery is a volatile expression, each of its instances
should be evaluated separately.

This seems like a valid point, though "query 2" below which groups over a
RANDOM() column and outputs an additional RANDOM() column a potential,
albeit contrived, counter-example? [NOTE: this was done on Postgres 16.3]
I've included a few different combinations of GROUP BYs.

-- setup
create table t as (select 0 x);
analyze t;

-- query 1: base --> multiple evaluations of RANDOM(), col0 != col1
postgres=# select x, random() col0, random() col1 from t group by x;
x | col0 | col1
---+---------------------+--------------------
0 | 0.07205921113992653 | 0.9847359546402477
(1 row)

-- query 2: group by one volatile column --> single evaluation of RANDOM(),
col0 == col1
postgres=# select x, random() col0, random() col1 from t group by x, col0;
x | col0 | col1
---+--------------------+--------------------
0 | 0.7765600922298943 | 0.7765600922298943
(1 row)

-- query 3: group by both volatile columns --> multiple evaluations of
RANDOM() again, col0 != col1
postgres=# select x, random() col0, random() col1 from t group by x, col0,
col1;
x | col0 | col1
---+---------------------+--------------------
0 | 0.07334303548896548 | 0.6528967617521189
(1 row)

--

Related to your point about the unexpected asymmetry in single vs multiple
evaluations of subquery plans, I'm curious if the pair of subqueries in
both examples below should be considered equivalent? The queries output the
same results and the subqueries differ only in output name. With this
patch, they're considered equivalent in the first query but not in the
second. [NOTE: this was done on a branch with the patch applied]

-- query 1: alias outside subquery
test=# explain (verbose, costs off) select x, (select 1) col0, (select 1)
col1 from t group by x, col0;
QUERY PLAN
-----------------------------------------------------
Group
Output: t.x, (InitPlan 1).col1, (InitPlan 1).col1
Group Key: t.x
InitPlan 1
-> Result
Output: 1
-> Sort
Output: t.x
Sort Key: t.x
-> Seq Scan on public.t
Output: t.x
(11 rows)

...compared to...

-- query 2: alias inside subquery
test=# explain (verbose, costs off) select x, (select 1 col0), (select 1
col1) from t group by x, col0;
QUERY PLAN
-----------------------------------------------------
Group
Output: t.x, (InitPlan 1).col1, (InitPlan 2).col1
Group Key: t.x
InitPlan 1
-> Result
Output: 1
InitPlan 2
-> Result
Output: 1
-> Sort
Output: t.x
Sort Key: t.x
-> Seq Scan on public.t
Output: t.x
(14 rows)

-Paul-

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-07-17 00:58:51 RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Previous Message Alexander Korotkov 2024-07-17 00:45:07 Re: Removing unneeded self joins