| From: | Tobias Hoffmann <ldev-list(at)thax(dot)hardliners(dot)org> |
|---|---|
| To: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
| Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Non-trivial condition is only propagated to one side of JOIN |
| Date: | 2024-08-25 17:04:32 |
| Message-ID: | 13ebf00f-2991-43c5-5d11-a8216cb3217a@thax.hardliners.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 25/08/2024 17:35, David G. Johnston wrote:
> On Sunday, August 25, 2024, Tobias Hoffmann
> <ldev-list(at)thax(dot)hardliners(dot)org> wrote:
>
>
> 3) Problematic example:
>
> # EXPLAIN SELECT * FROM tbl1 JOIN tbl2 ON tbl2.site_id =
> tbl1.site_id WHERE tbl1.site_id = 1 OR tbl1.site_id IS NULL;
>
>
> The “is null” predicate in this query is doing nothing as your next
> comment alludes to; you will produce no rows out of the join with a
> null site_id due to the use of the equals operator in the join.
Well, that's why I said: "keep in mind that this example is as
simplified as possible"...
Even though `tbl1.site_id = 1` is – in this case – completely equivalent
to `tbl1.site_id = 1 OR tbl1.site_id IS NULL` – the first one is
completely pushed down, but the second is not.
A more complete example might look more like this:
CREATE VIEW "subview1" AS
SELECT tbl1.site_id, ... JOIN ... ON tbl1.site_id = tbl2.site_id
WHERE ...;
CREATE VIEW "view1" AS
SELECT site_id, ... FROM subview1 -- maybe even: WHERE site_id IS
NOT NULL
UNION ALL
SELECT null, ...;
SELECT * FROM view1 WHERE (site_id = 1 OR site_id IS NULL);
The reason, why the outer query would have a more complicated condition
might have nothing to do with the subquery containing the JOIN.
(This is also not a `UNION ALL` special case: `site_id IS NULL` could
also be generated by a LEFT JOIN, e.g.)
But not pushing down the condition has the grave impact, that those -
otherwise working VIEWs (i.e. subview1) become "unfixably" broken, for
certain WHERE-conditions on the outside.
Another reason why I said `site_id INTEGER NOT NULL` and `IS NOT
DISTINCT FROM`, is that there might be some mathematical reason I'm not
yet aware of where the propagation would not be sound, but which would
not apply for arbitrary site_id [nullable, ...].
My primary goal is to find at least *some* way to get the condition
pushed further in to avoid the full table scan, and to not have to
completely rewrite all the VIEWs into a single big query, where I could
inject the site_id parameter (e.g. "$1") in multiple places as needed...
Tobias
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Greg Sabino Mullane | 2024-08-25 17:14:36 | Re: Better error message when --single is not the first arg to postgres executable |
| Previous Message | Tom Lane | 2024-08-25 16:52:49 | Re: Non-trivial condition is only propagated to one side of JOIN |