Re: BUG #18652: Planner can not find pathkey item to sort for query with expression and expression index

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18652: Planner can not find pathkey item to sort for query with expression and expression index
Date: 2024-10-11 20:59:24
Message-ID: 3001952.1728680364@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Andrei Lepikhov <lepihov(at)gmail(dot)com> writes:
> Why not look if some entry of the TargetList contains the var? Something
> like attached?

I finally got my head wrapped around what is happening here.
Using a variant of Richard's example case,

SELECT b + 0 AS c FROM
(SELECT i AS b FROM t t1 UNION ALL SELECT i + 1 FROM t t2) ss
ORDER BY c;

We initially have an EquivalenceClass containing "ss.b + 0"
to represent the required ordering pathkey. When we flatten
the UNION ALL subselect into an appendrel, we add child EC
members containing transformed versions of that, namely
"t1.i + 0" and "(t2.i + 1) + 0". Similarly, the relation
targetlist for ss is "ss.b" so we derive the targetlists for
the appendrel members t1 and t2 as "t1.i" and "t2.i + 1".

Then the problem is that find_computable_ec_member is trying
to verify that "(t2.i + 1) + 0" can be computed from a
subquery that currently emits only "t2.i + 1". Its method
of pulling out just t2.i and looking for that in the subquery
tlist obviously fails.

Now, the way that the commentary for find_computable_ec_member
is written would lead you to think that what we need is to identify
that the subexpression (t2.i + 1) is available from the tlist, with
the expectation that a higher-level plan node would then compute
"subexpression + 0" from that output of a lower plan node. That's
possible but it would require expensive search to identify things.
But that's not what actually happens in the sole use of this code by
prepare_sort_from_pathkeys. What will actually happen is that we
will add the child EC member expression as a new member of the
same tlist, so that the plan node's tlist will look like

t2.i + 1, (t2.i + 1) + 0

This means that it will work so long as all of the Vars needed by the
EC member expression are available from the plan node's input, which
they surely are if they are referenced in the existing tlist. That
is, even if we wanted to compute "t2.i + 2" it'd be fine.

(This would fall down perhaps if there are volatile functions in
the sort expression, but I believe we already reject using volatile
expressions for merge append, so it's not a problem.)

So I conclude that Andrei's patch will fix it, although I don't like
the way that that requires (potentially) multiple re-executions of
pull_var_clause. I think we should refactor the code to avoid that,
which actually ends up being less code, as in the attached draft.

I wonder whether we're doing more work here than we really need to.
If the underlying table t1 has columns i and j, and we have an EC
member that references t1.j while the tlist only mentions t1.i,
wouldn't it still work to add t1.j to the tlist? So maybe groveling
through the tlist members is unnecessary and we only need to be
performing some kind of relation-level check on whether all the
required relations are included in the input. But I'm hesitant
to make that kind of leap of faith in a patch that needs to be
back-patched, especially if the problem only arises in such narrow
edge cases that we've failed to detect it for 14 years.

regards, tom lane

Attachment Content-Type Size
bug-18652-draft-fix.patch text/x-diff 4.2 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrei Lepikhov 2024-10-12 01:53:19 Re: Question of Parallel Hash Join on TPC-H Benchmark
Previous Message Alena Rybakina 2024-10-11 10:42:05 Re: BUG #18652: Planner can not find pathkey item to sort for query with expression and expression index