From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Tender Wang <tndrwang(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Anti join confusion |
Date: | 2025-03-11 03:46:16 |
Message-ID: | CAMbWs4-Ak1-NsbmBBOBF=Bpr-sjt2Z-wRCb6EJJ=fJhkd6c90A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Feb 27, 2025 at 6:58 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Yeah. The key problem blocking doing something about it in the
> planner is that at the time we want to do join tree restructuring,
> we haven't yet collected the per-relation data that would allow
> us to know about NOT NULL constraints, nor run expression
> simplification that would be important for detecting all but the
> most basic cases. Doing either of those things twice seems like
> a nonstarter from a planner-speed viewpoint. It's possible that
> we could rearrange things to collect catalog data sooner.
> (I'm vaguely imagining that we'd collect that stuff by relation
> OID to begin with, and then when we build RelOptInfos later on
> we could link to the per-rel-OID catalog data.) Moving expression
> simplification up seems considerably more fraught, but maybe we could
> handle the cases of common interest without having done that.
Yeah, this is a problem we need to resolve if we want to go with this
option.
Another challenge, as far as I can see, is that determining whether
the outer side contains NULL values can be quite tricky due to join
reordering. As an example, consider
select * from t t1
join t t2 on t1.a = t2.a
join t t3 on t2.a NOT IN (select a from t t4);
Due to the strict qual "t1.a = t2.a", we might assume that "t2.a"
cannot be NULL in the NOT IN sublink and proceed to convert it into an
anti-join. However, the join order could end up as
((t1 JOIN (t2 ANTI JOIN t4)) JOIN t3)
... in which case "t2.a" can be NULL in the anti-join.
I haven't fully thought this through and am not sure if there are
other cases of join reordering that could lead us to make the wrong
decision about whether the outer side of a NOT IN contains NULL
values. But this is something we need to get right.
I haven't seen much discussion about the second option (adding support
in the executor to handle the NULL semantics of NOT IN). Any thoughts
on that?
Thanks
Richard
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2025-03-11 03:47:16 | Re: Commitfest app release on Feb 17 with many improvements |
Previous Message | David G. Johnston | 2025-03-11 03:36:45 | Re: Documentation Edits for pg_createsubscriber |