Re: Query result differences between PostgreSQL 17 vs 16

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Ronald Cruz <cruz(at)rentec(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Peter Ford <pford(at)rentec(dot)com>, "Aaron J(dot) Garcia" <agarcia(at)rentec(dot)com>
Subject: Re: Query result differences between PostgreSQL 17 vs 16
Date: 2025-02-28 15:48:18
Message-ID: 2748914.1740757698@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> On Thu, Feb 27, 2025 at 1:28 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I've not actually worked through the bug case to verify my hypothesis
>> that it's the lack of a representative for a clone group that is
>> breaking things. (Maybe you did?)

> Yeah, I think this is the cause of the failure. Specifically, when we
> start with (A/B)/C and transform to A/(B/C) per OJ identity 3, we need
> a way to prevent upper clauses that use only C Vars from being pushed
> down and applied as a filter clause at the lower B/C join. To achieve
> this, we consider that the pushed-down B/C join has not completed, and
> hence do not include B/C join's ojrelid in its relid set (it will be
> added when we form the A/B join).

> This is why, in this case, the version of 'customer.cid IS NOT NULL'
> that is not marked as nullable by the left join to customer is chosen:
> we've assumed that that left join has not completed yet.

Oh! So I found the right solution through the wrong chain of
reasoning ;=). It's not that the clone group is missed during qual
selection: it's that the selected representative lacks a nullingrel
bit, so when add_join_clause_to_rel acts on it later, it thinks it's
okay to throw away.

> However,
> performing the NOT NULL deduction on this clone would be unsafe. It
> seems to me that, for clone clauses, the nullingrels mark is not that
> reliable for performing such deductions.

Precisely. We have to remove the nullingrel bit from the Var, else
the placement cross-checks in setrefs.c will (correctly) complain that
it's being evaluated at the wrong level. But relying on it in this
particular way at this particular point in the proceedings gives the
wrong conclusion.

>> I'm a bit inclined to argue that we should put in the no-clone-quals
>> limitation anyway, to protect future callers that might not manage
>> to dodge the problem.

> Now I'm inclined to agree with you. It seems we've been lucky so far
> not to encounter issues with IS NULL deductions on clone clauses, but
> this feels like a problem waiting to happen in the future.

Yes, I'm on board with that too. I think the comments need more work
than you've done though. I think they should say something about
"nullingrel bits in clone clauses may not reflect reality, so we dare
not draw conclusions from clones about whether Vars are guaranteed
not-null". Also, I'm not convinced any more that this has anything to
do with missing clone members, so we should remove that claim.

> The no-clone-quals limitation in restriction_is_always_false makes the
> fix in f00ab1fd1 unnecessary. I think we should revert those changes.

Hmm, I'm less on board with that. The saving-and-restoring of
last_rinfo_serial looks correct to me, even though this fix might
cause the test case you found before to not tickle that bug.

> Attached is the updated patch.

I'm confused by the changes to existing test cases in this? It looks
like you're undoing some test changes that are not from f00ab1fd1, so
I don't understand why. My inclination here would be to add a new
test that exposes the current bug, but leave the rest alone, even
though they may no longer be testing exactly what they were originally
meant to do.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2025-02-28 22:16:14 Re: Query result differences between PostgreSQL 17 vs 16
Previous Message Richard Guo 2025-02-28 08:47:09 Re: Query result differences between PostgreSQL 17 vs 16