From: | Tobias Hoffmann <ldev-list(at)thax(dot)hardliners(dot)org> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Non-trivial condition is only propagated to one side of JOIN |
Date: | 2024-08-26 07:42:33 |
Message-ID: | b00b800b-591c-5699-b0e2-c6550056153d@thax.hardliners.org |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 25/08/2024 19:28, Tom Lane wrote:
> For this particular case, you could probably get somewhere by
> writing
>
> SELECT * FROM view1 WHERE site_id = 1
> UNION ALL
> SELECT * FROM view1 WHERE site_id IS NULL;
>
Thank you for your suggestion, Tom.
Unfortunately, as I now understand, nothing *except* `var = const` can
ever be propagated to the second branch of the join.
In particular, even just `WHERE site_id IS NULL` no longer propagates
like `WHERE site_id = 1` does.
Other cases, that do not propagate, include `WHERE site_id IN (1, 2)`.
I'll probably have to find another workaround to my current problem.
----
More generally, I think that the currently possible set is very
restrictive and affects not just edge-cases;
my SQL-Engine-implementation-fu is far from good enough for the
necessary changes, though.
Here are some more thoughts:
> Maybe some machinery could be built that would do something useful
> with an OR clause of this form,
>
> [...]
> An important point here is that "WHERE A = B AND p(A)" does not permit
> us to deduce "p(B)" for arbitrary conditions p(),
IMO the relevant equality should be the `ON tbl2.site_id IS NOT DISTINCT
FROM tbl1.site_id` (resp. `ON tbl2.site_id = tbl1.site_id`, but nulls
probably need special care), which should allow any predicate `p` only
depending on `tbl1.site_id` (i.e.`WHERE p(tbl1.site_id)`, from
"outside") to be pulled "inside" the INNER JOIN or LEFT JOIN, because no
row which does not satisfy `p(tbl1.site_id)`, and, via equivalence,
`p(tbl2.site_id)` could ever be part of the result.
More specifically, given a WHERE clause in CNF (i.e. `p0(...) AND
p1(...) AND (p2a(...) OR p2b(...)) AND ...`), every top-level term which
only uses variables which are deemed equivalent, should be allowed to
propagate.
If this is too difficult, not just single constants (`var = const`), but
sets of constants (`var = ANY(...)`) and/or especially `var IS NULL`)
should be considered.
Just my 2ct...
Tobias
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2024-08-26 07:58:00 | Re: how to log into commitfest.postgresql.org and begin review patch |
Previous Message | Peter Smith | 2024-08-26 07:29:56 | Re: Collect statistics about conflicts in logical replication |