Re: Non-trivial condition is only propagated to one side of JOIN

From: Tobias Hoffmann <ldev-list(at)thax(dot)hardliners(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Non-trivial condition is only propagated to one side of JOIN
Date: 2024-08-26 07:42:33
Message-ID: b00b800b-591c-5699-b0e2-c6550056153d@thax.hardliners.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25/08/2024 19:28, Tom Lane wrote:
> For this particular case, you could probably get somewhere by
> writing
>
> SELECT * FROM view1 WHERE site_id = 1
> UNION ALL
> SELECT * FROM view1 WHERE site_id IS NULL;
>
Thank you for your suggestion, Tom.

Unfortunately, as I now understand, nothing *except*  `var = const` can
ever be propagated to the second branch of the join.
In particular, even just `WHERE site_id IS NULL` no longer propagates
like `WHERE site_id = 1` does.
Other cases, that do not propagate, include `WHERE site_id IN (1, 2)`.

I'll probably have to find another workaround to my current problem.

----

More generally, I think that the currently possible set is very
restrictive and affects not just edge-cases;

my SQL-Engine-implementation-fu is far from good enough for the
necessary changes, though.

Here are some more thoughts:

> Maybe some machinery could be built that would do something useful
> with an OR clause of this form,
>
> [...]
> An important point here is that "WHERE A = B AND p(A)" does not permit
> us to deduce "p(B)" for arbitrary conditions p(),

IMO the relevant equality should be the `ON tbl2.site_id IS NOT DISTINCT
FROM tbl1.site_id` (resp. `ON tbl2.site_id = tbl1.site_id`, but nulls
probably need special care), which should allow any predicate `p` only
depending on `tbl1.site_id` (i.e.`WHERE p(tbl1.site_id)`, from
"outside") to be pulled "inside" the INNER JOIN or LEFT JOIN, because no
row which does not satisfy `p(tbl1.site_id)`, and, via equivalence,
`p(tbl2.site_id)` could ever be part of the result.

More specifically, given a WHERE clause in CNF (i.e. `p0(...) AND
p1(...) AND (p2a(...) OR p2b(...)) AND ...`), every top-level term which
only uses variables which are deemed equivalent, should be allowed to
propagate.

If this is too difficult, not just single constants (`var = const`), but
sets of constants (`var = ANY(...)`) and/or especially `var IS NULL`)
should be considered.

Just my 2ct...

  Tobias

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-08-26 07:58:00 Re: how to log into commitfest.postgresql.org and begin review patch
Previous Message Peter Smith 2024-08-26 07:29:56 Re: Collect statistics about conflicts in logical replication