Re: nested query vs left join: query planner very confused

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rysdam <drysdam(at)ll(dot)mit(dot)edu>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: nested query vs left join: query planner very confused
Date: 2013-11-27 16:21:09
Message-ID: 17617.1385569269@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

David Rysdam <drysdam(at)ll(dot)mit(dot)edu> writes:
> Sometimes I want to know if I have any orphans in mags, so I do a query
> like this:

> select signum from lp.Mags where signum is not null and signum not
> in (select lp.Sigs.signum from lp.Sigs)

> (I do this as a subquery because we originally had a old Sybase DB where
> outer joins were a non-standard pain and this way works the same and is
> DBMS-agnostic.)

DBMS-agnostic frequently means "sucks equally badly on all DBMSes" :-(
This query is hard to optimize because of the weird behavior of NOT IN
when nulls are involved. Since you aren't complaining that the query
fails entirely, I'm supposing that lp.Sigs.signum contains no nulls,
but the planner doesn't know that. If you can transform it to a NOT
EXISTS, you'll likely get a much better plan:

select signum from lp.Mags where signum is not null and
not exists (select 1 from lp.Sigs where lp.Sigs.signum = lp.Mags.signum)

What you want is an "anti join" plan, or at least a plan that mentions
a "hashed subplan". Plain subplans are death performance-wise, because
they amount to being nestloop joins rather than anything smarter. (In
this case it's likely not choosing a hashed subplan because work_mem is
too small to allow that.)

> I also decided to try doing the query a different way:

> select lp.mags.signum from lp.mags left join lp.sigs on
> lp.mags.signum = lp.sigs.signum where lp.mags.signum is not null
> and lp.sigs.signum is null;

That's another way to get an anti-join (at least on recent PGs, I forget
if 9.0 recognizes it).

> This one runs fast for both of us. So I guess my second question is: why
> can't the query planner tell these are the same query?

They aren't. See comment about behavior with NULLs.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Johnston 2013-11-27 16:26:48 Re: nested query vs left join: query planner very confused
Previous Message Vik Fearing 2013-11-27 16:15:18 Re: nested query vs left join: query planner very confused