Re: Converting NOT IN to anti-joins during planning

From: Antonin Houska <ah(at)cybertec(dot)at>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Converting NOT IN to anti-joins during planning
Date: 2019-05-27 14:22:29
Message-ID: 2759.1558966949@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Antonin Houska <ah(at)cybertec(dot)at> wrote:

> One problem I see is that SubLink can be in the JOIN/ON clause and thus it's
> not necessarily at the top of the join tree. Consider this example:
>
> CREATE TABLE a(i int);
> CREATE TABLE b(j int);
> CREATE TABLE c(k int NOT NULL);
> CREATE TABLE d(l int);
>
> SELECT *
> FROM
> a
> JOIN b ON b.j NOT IN
> ( SELECT
> c.k
> FROM
> c)
> JOIN d ON b.j = d.l;
>
> Here the b.j=d.l condition makes the planner think that the "b.j NOT IN
> (SELECT c.k FROM c)" sublink cannot receive NULL values of b.j, but that's not
> true: it's possible that ((a JOIN b) ANTI JOIN c) is evaluated before "d" is
> joined to the other tables, so the NULL values of b.j are not filtered out
> early enough.
>
> I thought it would help if find_innerjoined_rels(), when called from
> expressions_are_not_nullable(), only collected rels (and quals) from the
> subtree below the sublink, but that does not seem to help:
>
> CREATE TABLE e(m int);
>
> SELECT *
> FROM
> a
> JOIN e ON a.i = e.m
> JOIN b ON a.i NOT IN
> ( SELECT
> c.k
> FROM
> c)
> JOIN d ON COALESCE(a.i, 0) = COALESCE(d.l, 0);
>
> Here it might seem that the a.i=e.m condition eliminates NULL values from the
> ANTI JOIN input, but it's probably hard to prove at query preparation time
> that
>
> (((a JOIN e) JOIN b) ANTI JOIN c) JOIN d
>
> won't eventually be optimized to
>
> (((a JOIN d) JOIN b) ANTI JOIN c) JOIN e
>
> Since the join condition between "a" and "d" is not strict in this case, the
> ANTI JOIN will receive the NULL values of a.i.
>
> It seems tricky, I've got no idea of an alternative approach right now.

Just one idea: perhaps we could use something like PlaceHolderVar to enforce
evaluation of the inner join expression ("a.i=e.m" in the example above) at
certain level of the join tree (in particular, below the ANTI JOIN) -
something like make_outerjoininfo() does here:

/* Else, prevent join from being formed before we eval the PHV */
min_righthand = bms_add_members(min_righthand, phinfo->ph_eval_at);

Unlike the typical use of PHV, we would not have to check whether the
expression is not evaluated too low in the tree because the quals collected by
find_innerjoined_rels() should not reference nullable side of any outer join.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-05-27 14:22:37 Re: Why does pg_checksums -r not have a long option?
Previous Message Andres Freund 2019-05-27 14:04:41 Re: [HACKERS] Runtime Partition Pruning