Re: Non-trivial condition is only propagated to one side of JOIN

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Tobias Hoffmann <ldev-list(at)thax(dot)hardliners(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Non-trivial condition is only propagated to one side of JOIN
Date: 2024-08-25 16:52:49
Message-ID: 2588564.1724604769@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
> On Sunday, August 25, 2024, Tobias Hoffmann <ldev-list(at)thax(dot)hardliners(dot)org>
> wrote:
>> 3) Problematic example:
>>
>> # EXPLAIN SELECT * FROM tbl1 JOIN tbl2 ON tbl2.site_id = tbl1.site_id
>> WHERE tbl1.site_id = 1 OR tbl1.site_id IS NULL;

> The “is null” predicate in this query is doing nothing as your next comment
> alludes to; you will produce no rows out of the join with a null site_id
> due to the use of the equals operator in the join.

Indeed. This WHERE clause might be useful with a left join to tbl1,
but then the IS NULL means something totally different and can *not*
be pushed down.

> Others may correct me but I’m guessing that indeed the optimizer has a gap
> here that could be filled in, it’s just it feels like adding code to deal
> with broken queries so isn’t overly motivated to work on.

The short answer is that we expend quite a lot of effort to deduce
implied equality clauses from combinations of equality clauses;
that is, given "WHERE A = B AND B = C" we can deduce "A = C" as well.
No such deductions can be made from equality clauses that are buried
under an OR, because they might not be true for every join row.

Maybe some machinery could be built that would do something useful
with an OR clause of this form, but I doubt it would be useful
often enough to justify the development effort and additional
planner cycles.

An important point here is that "WHERE A = B AND p(A)" does not permit
us to deduce "p(B)" for arbitrary conditions p(), because we have some
equality operators that will return true for values that sort equal
but are distinguishable in other ways. (Handy example: in float8
arithmetic, zero and minus zero compare equal, as required by the IEEE
float spec.) We presently don't assume that this works for anything
other than transitive equality across equality operators of a single
btree operator family. In particular, I don't think we could assume
it for the given example, because "WHERE A = B AND A IS NULL" is
tautologically false. You can deduce anything from a falsehood,
so I'm not entirely sure that the proposed optimization is even
logically sound.

> Joining using
> distinct instead of equality is uncommon, since nearly all models join
> primary keys to foreign keys and both of those are almost always non-null.

Yeah. I'll concede that we probably should work harder on building
out planner and executor support for IS NOT DISTINCT FROM. But again,
it just seems like a case that is not worth spending large amounts of
development time on.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tobias Hoffmann 2024-08-25 17:04:32 Re: Non-trivial condition is only propagated to one side of JOIN
Previous Message Alena Rybakina 2024-08-25 16:12:40 Re: Vacuum statistics