Re: Alias of VALUES RTE in explain plan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Yasir <yasir(dot)hussain(dot)shah(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Alias of VALUES RTE in explain plan
Date: 2025-01-02 19:51:46
Message-ID: CA+TgmobgTZCjB+_Ld3b44p42ROhwPB8aJFXGwYZhdjXcJrEWnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 2, 2024 at 1:09 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> regression=# create view vv as SELECT * FROM (VALUES (4),(2),(3),(1) ORDER BY t1.x LIMIT 2) AS t1(x);
> CREATE VIEW
> regression=# \d+ vv
> View "public.vv"
> Column | Type | Collation | Nullable | Default | Storage | Description
> --------+---------+-----------+----------+---------+---------+-------------
> x | integer | | | | plain |
> View definition:
> SELECT x
> FROM ( VALUES (4), (2), (3), (1)
> ORDER BY t1_1.x
> LIMIT 2) t1(x);
>
> ruleutils has decided that it needs to make the two "t1" table
> aliases distinct. But of course that will fail on reload:
>
> regression=# SELECT x
> regression-# FROM ( VALUES (4), (2), (3), (1)
> regression(# ORDER BY t1_1.x
> regression(# LIMIT 2) t1(x);
> ERROR: missing FROM-clause entry for table "t1_1"
> LINE 3: ORDER BY t1_1.x
> ^
> It seems sufficient to avoid alias pushdown when there's an ORDER BY
> inside the VALUES subquery. We disallow a locking clause, and
> while there can be LIMIT/OFFSET, those aren't allowed to reference the
> VALUES output anyway. I added some test cases to show that this is
> enough to make view-dumping behave sanely.

I'm concerned about taking things in this direction. There's two scans
here, really: a Values Scan for the VALUES construct, and then a
Subquery Scan sitting on top of it that will normally be optimized
away. It seems to me that you're basically guessing whether the
subquery scan will be optimized away to a sufficient degree that its
alias will not leak out anywhere. But that seems a bit fragile and
error-prone. Whether to elide the subquery scan is a planner decision;
what aliases to assign to the planner output is a ruleutils.c
decision; but here you're talking about rejiggering things at parse
time. The correctness of that rejiggering depends crucially on what
will happen at plan time and then at EXPLAIN/ruleutils time, but the
rules for what will happen at those later times are pretty darn
complicated, so I feel like this is creating an awful lot of action at
a distance.

If were able (and I suspect we're not, but hypothetically) to in
effect pull up the subquery at parse time, so that to the planner and
executor it doesn't even exist, then I think that would be perfectly
fine, because then we would have strong reasons for believing that no
later decision can turn our parse-time decision into a problem. But to
leave that subquery there and guess that it's going to disappear
before we get to EXPLAIN doesn't seem nearly as safe. It seems pretty
easy to either miss things (like the ORDER BY case) or even to have
future planner changes break stuff.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2025-01-02 20:12:08 Re: Vacuum statistics
Previous Message Robert Haas 2025-01-02 19:24:50 Re: magical eref alias names