Re: Unexpected results when using GROUP BY GROUPING SETS and bind variables

From: Aner Perez <aner(dot)perez+pgsql-general(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Unexpected results when using GROUP BY GROUPING SETS and bind variables
Date: 2020-09-09 18:37:22
Message-ID: CAP-7WhM6O2xsBGnpaB9C0XjQL3YzJr5YGVV13rwvhKdPab6--Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks Tom,

I figured as much about the second query but I thought it would be safe to
use the first version with the inlined indexing.
I'm not sure if you're saying that the same query without the unindexed
path column in the select would be safe. Like this:

-- Do not GROUP BY or SELECT on path by itself --
SELECT path[1], path[2], path[3], path[4], sum(value)
FROM bind_group_by
GROUP BY GROUPING SETS (
(path[1], path[2], path[3], path[4]),
(path[1], path[2], path[3]),
(path[1], path[2]),
(path[1]),
()
)
ORDER BY 1, 2, 3, 4;

Or if using indexed path elements in the GROUP BY is the issue and I should
put the array indexing in a subselect and do the aggregation in the outer
select. Like this:

-- Safer Subselect Version --
SELECT level1, level2, level3, level4, sum(value)
FROM (
SELECT path[1] as level1, path[2] as level2, path[3] as level3, path[4]
as level4, value
FROM bind_group_by
) AS expanded
GROUP BY GROUPING SETS (
(level1, level2, level3, level4),
(level1, level2, level3),
(level1, level2),
(level1),
()
)
ORDER BY 1, 2, 3, 4;

Thanks for the insight,

- Aner

On Wed, Sep 9, 2020 at 1:13 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Aner Perez <aner(dot)perez+pgsql-general(at)gmail(dot)com> writes:
> > [ these queries don't give the same results: ]
>
> > SELECT path[1], path[2], path[3], path, sum(value)
> > FROM bind_group_by
> > GROUP BY GROUPING SETS (
> > (path[1], path[2], path[3], path),
> > (path[1], path[2], path[3]),
> > (path[1], path[2]),
> > (path[1]),
> > ()
> > )
> > ORDER BY 1, 2, 3, 4
> > ;
>
> > PREPARE prepared_group_by (int, int, int, int, int, int, int, int, int,
> > int, int, int) AS
> > SELECT path[$1], path[$2], path[$3], path, sum(value)
> > FROM bind_group_by
> > GROUP BY GROUPING SETS (
> > (path[$4], path[$5], path[$6], path),
> > (path[$7], path[$8], path[$9]),
> > (path[$10], path[$11]),
> > (path[$12]),
> > ()
> > )
> > ORDER BY 1, 2, 3, 4
> > ;
> > EXECUTE prepared_group_by (1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 1);
>
> I think you're kind of shooting yourself in the foot here by writing
> a query with more than one possible interpretation. Since you have
> GROUP BY items that are both the whole "path" array and specific
> elements of it, it's not clear whether a targetlist reference such
> as "path[1]" is meant to refer to the GROUP BY item "path[1]", or
> to be a subscripting operator applied to the GROUP BY item "path".
>
> It appears that the parser makes the first choice in your first
> query, although that seems like an implementation detail that
> I wouldn't care to bet on going forward.
>
> In the second query, the parser is certainly not going to think that
> "path[$1]" matches "path[$4]" or any of the other single-element
> GROUP BY items, so it decides that it means subscripting the "path"
> item. Now the result will be NULL for any grouping set that doesn't
> include the "path" item.
>
> I haven't gone through the results one-by-one, but just by eyeball
> I think this is sufficient to explain the discrepancies.
>
> (I note that the SQL standard only allows GROUP BY items to be
> simple column references. Maybe that's not an arbitrary restriction
> but is meant to forestall this sort of ambiguity? Hard to be sure.)
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2020-09-09 18:45:53 Re: Unexpected results when using GROUP BY GROUPING SETS and bind variables
Previous Message Francisco Olarte 2020-09-09 17:18:48 Re: Network performance optimization