From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Daniel Farkaš <daniel(dot)farkas(at)datoris(dot)com> |
Cc: | Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #17502: View based on window functions returns wrong results when queried |
Date: | 2022-05-30 01:11:53 |
Message-ID: | CAApHDvqKO00nNQtchBs65VTfjEMUsYiB5r2P4VUreTdXE9RY1g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Mon, 30 May 2022 at 10:30, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> I'm a little bit uncertain if the correct fix is to have
> expression_returns_set() look deeper into WindowFuncs to check if the
> WindowClause that the function belongs to has any SRFs in the
> PARTITION BY / ORDER BY clause. Unfortunately, doing that means
> having to pass the PlannerInfo to expression_returns_set(). I don't
> quite see how that could be made to work in the back branches.
There does seem to be other weird anomalies as a result of
expression_returns_set() not looking for SRFs in the WindowFunc
PARTITION BY / ORDER BY clause.
Looking at the usage of expression_returns_set() in
make_sort_input_target(), I'd have expected the following 2 queries to
act in the same way:
setup:
create table ab (a int, b int);
insert into ab values(1,1),(1,2),(2,1),(2,2);
postgres=# select distinct on (a) a,b, row_number() over (order by
generate_Series(1,2)) from ab order by a,b;
a | b | row_number
---+---+------------
1 | 1 | 1
2 | 1 | 7
(2 rows)
postgres=# select distinct on (a) a,b, generate_Series(1,2) from ab
order by a,b;
a | b | generate_series
---+---+-----------------
1 | 1 | 1
1 | 1 | 2
2 | 1 | 1
2 | 1 | 2
(4 rows)
The fact that those are different is starting to make me think that
expression_returns_set() should be looking into WindowFuncs to see if
the WindowClause has SRFs rather than adding special code to
remove_unused_subquery_outputs() in order to fix the bug being
reported on this thread.
However, I'm not sure the latter of the above two queries result makes
any sense given that our documentation [1] claims:
"PostgreSQL's behavior for a set-returning function in a query's
select list is almost exactly the same as if the set-returning
function had been written in a LATERAL FROM-clause item instead. For
example,"
the "almost" is later defined as:
"It would be exactly the same, except that in this specific example,
the planner could choose to put g on the outside of the nested-loop
join, since g has no actual lateral dependency on tab."
which only indicates that the order of the results may differ, not the
actual results themselves.
You can see the LATERAL version only returns 2 rows:
postgres=# select distinct on (a) a, b, g.s from ab, lateral
generate_Series(1,2) g(s) order by a,b;
a | b | s
---+---+---
1 | 1 | 1
2 | 1 | 1
(2 rows)
The following is also pretty strange. Why should adding the SRF column
to the ORDER BY change the number of output rows?
postgres=# select distinct on (a) a,b, generate_Series(1,2) from ab
order by a,b,3 desc;
a | b | generate_series
---+---+-----------------
1 | 1 | 2
2 | 1 | 2
(2 rows)
postgres=# select distinct on (a) a,b, generate_Series(1,2) from ab
order by a,b;
a | b | generate_series
---+---+-----------------
1 | 1 | 1
1 | 1 | 2
2 | 1 | 1
2 | 1 | 2
(4 rows)
The LATERAL version of this will return 2 rows regardless of what the
ORDER BY says:
postgres=# select distinct on (a) a, b, g.s from ab, lateral
generate_Series(1,2) g(s) order by a,b,g.s DESC;
a | b | s
---+---+---
1 | 1 | 2
2 | 1 | 2
(2 rows)
postgres=# select distinct on (a) a, b, g.s from ab, lateral
generate_Series(1,2) g(s) order by a,b;
a | b | s
---+---+---
1 | 1 | 1
2 | 1 | 1
(2 rows)
In addition to what the documentation claims about SRFs in the target
list being the same as LATERAL, maybe the following should give us
some guidance on if the non-lateral SRF queries above should return 2
or 4 rows:
postgres=# select distinct generate_Series(0,1)/2;
?column?
----------
0
(1 row)
In the above, the ProjectSet occurs before the distinctification. That
would translate into the 2-row version of the DISTINCT ON query above
being correct.
I think that roughly translates into changes being required in both
make_sort_input_target() and expression_returns_set().
Does anyone have any other opinions?
David
[1] https://www.postgresql.org/docs/current/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-05-30 02:18:59 | Re: BUG #17502: View based on window functions returns wrong results when queried |
Previous Message | Rob Sargent | 2022-05-30 00:02:30 | Re: psql 15beta1 does not print notices on the console until transaction completes |