Re: weird GROUPING SETS and ORDER BY behaviour

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Geoff Winkless <pgsqladmin(at)geoff(dot)dj>, Zhang Mingli <zmlpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: weird GROUPING SETS and ORDER BY behaviour
Date: 2024-01-06 19:49:16
Message-ID: 1345131.1704570556@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
> Something does seem off here with the interaction between grouping sets and
> order by.

Yeah. I think Geoff is correct to identify the use of subqueries in
the grouping sets as the triggering factor. We can get some insight
by explicitly printing the ordering values:

SELECT
GROUPING(test1.n) AS gp_n,
GROUPING(concat(test1.n, (SELECT x FROM test2 WHERE seq=test1.seq)))
AS gp_conc,
test1.n,
CONCAT(test1.n, (SELECT x FROM test2 WHERE seq=test1.seq)),
CASE WHEN GROUPING(test1.n)=0 THEN test1.n ELSE NULL END as o1,
CASE WHEN GROUPING(concat(test1.n, (SELECT x FROM test2 WHERE seq=test1.seq)))=0 THEN concat(test1.n, (SELECT x FROM test2 WHERE seq=test1.seq)) ELSE NULL END as o2
FROM test1
GROUP BY
GROUPING SETS(
(test1.n),
(concat(test1.n, (SELECT x FROM test2 WHERE seq=test1.seq)))
)
ORDER BY o1 NULLS FIRST, o2 NULLS FIRST;

which produces

gp_n | gp_conc | n | concat | o1 | o2
------+---------+----+--------+----+----
1 | 0 | | n5x1 | | x1
1 | 0 | | n4x2 | | x2
1 | 0 | | n3x3 | | x3
1 | 0 | | n2x4 | | x4
1 | 0 | | n1x5 | | x5
0 | 1 | n1 | | n1 |
0 | 1 | n2 | | n2 |
0 | 1 | n3 | | n3 |
0 | 1 | n4 | | n4 |
0 | 1 | n5 | | n5 |
(10 rows)

Those columns appear correctly sorted, so it's not the sort that
is misbehaving. But how come the values don't match the "concat"
column where they should? EXPLAIN VERBOSE gives a further clue:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=53622.76..53623.76 rows=400 width=136)
Output: (GROUPING(test1.n)), (GROUPING(concat(test1.n, (SubPlan 1)))), test1.n, (concat(test1.n, (SubPlan 2))), (CASE WHEN (GROUPING(test1.n) = 0) THEN test1.n ELSE NULL::text END), (CASE WHEN (GROUPING(concat(test1.n, (SubPlan 3))) = 0) THEN concat(test1.n, (SubPlan 4)) ELSE NULL::text END)
Sort Key: (CASE WHEN (GROUPING(test1.n) = 0) THEN test1.n ELSE NULL::text END) NULLS FIRST, (CASE WHEN (GROUPING(concat(test1.n, (SubPlan 3))) = 0) THEN concat(test1.n, (SubPlan 4)) ELSE NULL::text END) NULLS FIRST
-> HashAggregate (cost=32890.30..53605.48 rows=400 width=136)
Output: GROUPING(test1.n), GROUPING(concat(test1.n, (SubPlan 1))), test1.n, (concat(test1.n, (SubPlan 2))), CASE WHEN (GROUPING(test1.n) = 0) THEN test1.n ELSE NULL::text END, CASE WHEN (GROUPING(concat(test1.n, (SubPlan 3))) = 0) THEN concat(test1.n, (SubPlan 4)) ELSE NULL::text END
Hash Key: test1.n
Hash Key: concat(test1.n, (SubPlan 2))
-> Seq Scan on public.test1 (cost=0.00..32887.12 rows=1270 width=68)
Output: test1.n, concat(test1.n, (SubPlan 2)), test1.seq
SubPlan 2
-> Seq Scan on public.test2 (cost=0.00..25.88 rows=6 width=32)
Output: test2.x
Filter: (test2.seq = test1.seq)
SubPlan 4
-> Seq Scan on public.test2 test2_1 (cost=0.00..25.88 rows=6 width=32)
Output: test2_1.x
Filter: (test2_1.seq = test1.seq)
(17 rows)

We have ended up with four distinct SubPlans (two of which seem to
have gotten dropped because they are inside GROUPING functions,
which never really evaluate their arguments). What I think happened
here is that the parser let the concat() expressions in the targetlist
and ORDER BY through because they were syntactically identical to
GROUPING SET expresions --- but later on, the planner expanded each
of the sub-selects to a distinct SubPlan, and that meant that those
subexpressions were no longer identical, and so most of them didn't
get converted to references to the grouping key column output by the
HashAggregate node. Because they didn't get converted, they fail to
do the right thing in rows where they should go to NULL because
they're from the wrong grouping set. The "concat" targetlist element
did get converted, so it behaves correctly, and the GROUPING functions
don't actually care because they have a different method for
determining what they should output. But you get the wrong answer for
the concat() inside the "o2" expression: it gets evaluated afresh
using the nulled value of test1.n.

I think this particular symptom might be new, but we've definitely
seen related trouble reports before. I'm inclined to think that the
right fix will require making the parser actually replace such
expressions with Vars referencing a notional grouping output relation,
so that there's not multiple instances of the sub-query in the parser
output in the first place. That's a fairly big job and nobody's
tackled it yet.

In the meantime, what I'd suggest as a workaround is to put those
subexpressions into a sub-select with an optimization fence (you
could use OFFSET 0 or a materialized CTE), so that the grouping
sets list in the outer query just has simple Vars as elements.

[Digression: the SQL spec *requires* grouping set elements to be
simple Vars, and I begin to see why when contemplating examples like
this. It's a little weird that "concat(test1.n, ...)" can evaluate
with a non-null value of test1.n in a row where test1.n alone would
evaluate as null. However, we've dug this hole for ourselves and now
we have to deal with the consequences.]

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-01-06 20:10:59 Re: Password leakage avoidance
Previous Message Joe Conway 2024-01-06 18:31:22 Re: Password leakage avoidance