Re: Anti join confusion

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Tender Wang <tndrwang(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Anti join confusion
Date: 2025-03-11 03:46:16
Message-ID: CAMbWs4-Ak1-NsbmBBOBF=Bpr-sjt2Z-wRCb6EJJ=fJhkd6c90A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 27, 2025 at 6:58 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Yeah. The key problem blocking doing something about it in the
> planner is that at the time we want to do join tree restructuring,
> we haven't yet collected the per-relation data that would allow
> us to know about NOT NULL constraints, nor run expression
> simplification that would be important for detecting all but the
> most basic cases. Doing either of those things twice seems like
> a nonstarter from a planner-speed viewpoint. It's possible that
> we could rearrange things to collect catalog data sooner.
> (I'm vaguely imagining that we'd collect that stuff by relation
> OID to begin with, and then when we build RelOptInfos later on
> we could link to the per-rel-OID catalog data.) Moving expression
> simplification up seems considerably more fraught, but maybe we could
> handle the cases of common interest without having done that.

Yeah, this is a problem we need to resolve if we want to go with this
option.

Another challenge, as far as I can see, is that determining whether
the outer side contains NULL values can be quite tricky due to join
reordering. As an example, consider

select * from t t1
join t t2 on t1.a = t2.a
join t t3 on t2.a NOT IN (select a from t t4);

Due to the strict qual "t1.a = t2.a", we might assume that "t2.a"
cannot be NULL in the NOT IN sublink and proceed to convert it into an
anti-join. However, the join order could end up as

((t1 JOIN (t2 ANTI JOIN t4)) JOIN t3)

... in which case "t2.a" can be NULL in the anti-join.

I haven't fully thought this through and am not sure if there are
other cases of join reordering that could lead us to make the wrong
decision about whether the outer side of a NOT IN contains NULL
values. But this is something we need to get right.

I haven't seen much discussion about the second option (adding support
in the executor to handle the NULL semantics of NOT IN). Any thoughts
on that?

Thanks
Richard

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2025-03-11 03:47:16 Re: Commitfest app release on Feb 17 with many improvements
Previous Message David G. Johnston 2025-03-11 03:36:45 Re: Documentation Edits for pg_createsubscriber