From: | Antonin Houska <ah(at)cybertec(dot)at> |
---|---|
To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Converting NOT IN to anti-joins during planning |
Date: | 2019-05-27 14:22:29 |
Message-ID: | 2759.1558966949@localhost |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Antonin Houska <ah(at)cybertec(dot)at> wrote:
> One problem I see is that SubLink can be in the JOIN/ON clause and thus it's
> not necessarily at the top of the join tree. Consider this example:
>
> CREATE TABLE a(i int);
> CREATE TABLE b(j int);
> CREATE TABLE c(k int NOT NULL);
> CREATE TABLE d(l int);
>
> SELECT *
> FROM
> a
> JOIN b ON b.j NOT IN
> ( SELECT
> c.k
> FROM
> c)
> JOIN d ON b.j = d.l;
>
> Here the b.j=d.l condition makes the planner think that the "b.j NOT IN
> (SELECT c.k FROM c)" sublink cannot receive NULL values of b.j, but that's not
> true: it's possible that ((a JOIN b) ANTI JOIN c) is evaluated before "d" is
> joined to the other tables, so the NULL values of b.j are not filtered out
> early enough.
>
> I thought it would help if find_innerjoined_rels(), when called from
> expressions_are_not_nullable(), only collected rels (and quals) from the
> subtree below the sublink, but that does not seem to help:
>
> CREATE TABLE e(m int);
>
> SELECT *
> FROM
> a
> JOIN e ON a.i = e.m
> JOIN b ON a.i NOT IN
> ( SELECT
> c.k
> FROM
> c)
> JOIN d ON COALESCE(a.i, 0) = COALESCE(d.l, 0);
>
> Here it might seem that the a.i=e.m condition eliminates NULL values from the
> ANTI JOIN input, but it's probably hard to prove at query preparation time
> that
>
> (((a JOIN e) JOIN b) ANTI JOIN c) JOIN d
>
> won't eventually be optimized to
>
> (((a JOIN d) JOIN b) ANTI JOIN c) JOIN e
>
> Since the join condition between "a" and "d" is not strict in this case, the
> ANTI JOIN will receive the NULL values of a.i.
>
> It seems tricky, I've got no idea of an alternative approach right now.
Just one idea: perhaps we could use something like PlaceHolderVar to enforce
evaluation of the inner join expression ("a.i=e.m" in the example above) at
certain level of the join tree (in particular, below the ANTI JOIN) -
something like make_outerjoininfo() does here:
/* Else, prevent join from being formed before we eval the PHV */
min_righthand = bms_add_members(min_righthand, phinfo->ph_eval_at);
Unlike the typical use of PHV, we would not have to check whether the
expression is not evaluated too low in the tree because the quals collected by
find_innerjoined_rels() should not reference nullable side of any outer join.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2019-05-27 14:22:37 | Re: Why does pg_checksums -r not have a long option? |
Previous Message | Andres Freund | 2019-05-27 14:04:41 | Re: [HACKERS] Runtime Partition Pruning |