From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Ronald Cruz <cruz(at)rentec(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Peter Ford <pford(at)rentec(dot)com>, "Aaron J(dot) Garcia" <agarcia(at)rentec(dot)com> |
Subject: | Re: Query result differences between PostgreSQL 17 vs 16 |
Date: | 2025-02-28 22:16:14 |
Message-ID: | 2847214.1740780974@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
I wrote:
> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
>> Yeah, I think this is the cause of the failure. Specifically, when we
>> start with (A/B)/C and transform to A/(B/C) per OJ identity 3, we need
>> a way to prevent upper clauses that use only C Vars from being pushed
>> down and applied as a filter clause at the lower B/C join. To achieve
>> this, we consider that the pushed-down B/C join has not completed, and
>> hence do not include B/C join's ojrelid in its relid set (it will be
>> added when we form the A/B join).
>> This is why, in this case, the version of 'customer.cid IS NOT NULL'
>> that is not marked as nullable by the left join to customer is chosen:
>> we've assumed that that left join has not completed yet.
> Oh! So I found the right solution through the wrong chain of
> reasoning ;=). It's not that the clone group is missed during qual
> selection: it's that the selected representative lacks a nullingrel
> bit, so when add_join_clause_to_rel acts on it later, it thinks it's
> okay to throw away.
I found the time to trace through this example in detail, and your
description is accurate. This is explained (perhaps not at sufficient
length) in optimizer/README:
As usual, outer join identity 3 complicates matters. If we start with
(A leftjoin B on (Pab)) leftjoin C on (Pbc)
then the parser will have marked any C Vars appearing above these joins
with the RT index of the B/C join. If we now transform to
A leftjoin (B leftjoin C on (Pbc)) on (Pab)
then it would appear that a clause using only such Vars could be pushed
down and applied as a filter clause (not a join clause) at the lower
B/C join. But *this might not give the right answer* since the clause
might see a non-null value for the C Var that will be replaced by null
once the A/B join is performed. We handle this by saying that the
pushed-down join hasn't completely performed the work of the B/C join
and hence is not entitled to include that outer join relid in its
relid set. When we form the A/B join, both outer joins' relids will
be added to its relid set, and then the upper clause will be applied
at the correct join level. (Note there is no problem when identity 3
is applied in the other direction: if we started with the second form
then upper C Vars are marked with both outer join relids, so they
cannot drop below whichever join is applied second.) Similarly,
Vars representing the output of a pushed-down join do not acquire
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
nullingrel bits for that join until after the upper join is performed.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
So in the case at hand, we generate a clone of the "rc3.id IS NOT
NULL" clause that doesn't have a nullingrel bit for the left join to
rc3, but nonetheless is expected to be applied after that join is
performed. There is also a clone that does have a nullingrel bit, but
that doesn't save us in this example because it will only be applied
in join orders other than the one that gets chosen.
It'd certainly be nice to make that cleaner, but I'm not sure how,
and we wouldn't risk back-patching the nontrivial changes that would
be needed. So I think rejecting has_clone/is_clone clauses is the
only path forward for now.
Also, I found a test case proving that restriction_is_always_false
needs it too. If we patch only restriction_is_always_true, then
try this modification of the submitted repro:
regression=# explain (costs off)
SELECT * FROM rc1
LEFT JOIN rc2 ON rc2.rc1_reference = rc1.description
LEFT JOIN rc3 ON rc2.id = rc3.rc2_reference
LEFT JOIN LATERAL rc_select(rc3.id) rs ON rc3.id IS NOT NULL and rc3.rc2_reference IS NULL;
QUERY PLAN
--------------------------------------------------------------------
Hash Right Join
Hash Cond: ((rc2.rc1_reference)::text = (rc1.description)::text)
-> Nested Loop Left Join
Join Filter: ((rc3.id IS NOT NULL) AND false)
-> Hash Right Join
Hash Cond: (rc3.rc2_reference = rc2.id)
-> Seq Scan on rc3
-> Hash
-> Seq Scan on rc2
-> Result
One-Time Filter: false
-> Hash
-> Seq Scan on rc1
(13 rows)
restriction_is_always_false has decided that "rc3.rc2_reference IS
NULL" is always false, although for exactly the same reasons discussed
here, it isn't. One other interesting point here is that the code has
also decided that the scan on rc_select can be reduced to dummy
because the ON qual is unsatisfiable. I think that is actually
correct in this case, but the conclusion is based on very faulty
logic, so I don't have any faith that there aren't nearby cases where
that would be another bug. With both is_always functions patched,
I get
regression=# explain (costs off)
SELECT * FROM rc1
LEFT JOIN rc2 ON rc2.rc1_reference = rc1.description
LEFT JOIN rc3 ON rc2.id = rc3.rc2_reference
LEFT JOIN LATERAL rc_select(rc3.id) rs ON rc3.id IS NOT NULL and rc3.rc2_reference IS NULL;
QUERY PLAN
-----------------------------------------------------------------------------
Hash Right Join
Hash Cond: ((rc2.rc1_reference)::text = (rc1.description)::text)
-> Nested Loop Left Join
Join Filter: ((rc3.id IS NOT NULL) AND (rc3.rc2_reference IS NULL))
-> Hash Right Join
Hash Cond: (rc3.rc2_reference = rc2.id)
-> Seq Scan on rc3
-> Hash
-> Seq Scan on rc2
-> Function Scan on rc_select rs
-> Hash
-> Seq Scan on rc1
(12 rows)
Maybe that's leaving something on the table, but I'm content to call
it good for today.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Sandeep Thakkar | 2025-03-01 11:20:17 | Re: BUG #18819: Error installing Postgresql |
Previous Message | Tom Lane | 2025-02-28 15:48:18 | Re: Query result differences between PostgreSQL 17 vs 16 |