Re: dumb question

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: John R Pierce <pierce(at)hogranch(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: dumb question
Date: 2016-06-02 22:19:38
Message-ID: CAKFQuwYgCb6+v8T82yunGiAmwstSDgwDYTdajXFB-9VHannPFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Jun 2, 2016 at 5:44 PM, Kevin Grittner <kgrittn(at)gmail(dot)com> wrote:

> On Thu, Jun 2, 2016 at 3:23 PM, David G. Johnston
> <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> > On Thu, Jun 2, 2016 at 4:11 PM, John R Pierce <pierce(at)hogranch(dot)com>
> wrote:
> >> Thanks all the below seem to do the trick.
>
> I doubt it -- using NOT IN requires (per the SQL specification)
> handling NULLs in a way that probably does not give you the answer
> you want. (NOT IN also is often much slower than the NOT EXISTS
> test which will actually give you the answer you want.)
>
> test=# create table t (id int not null primary key, ref_id int, sts
> int not null default 0);
> CREATE TABLE
> test=# insert into t values
> (1,null,0),(2,1,1),(3,null,0),(4,null,0),(5,4,1),(6,null,0),(7,6,1);
> INSERT 0 7
> test=# select max(id) from t where sts=0 and id not in (select ref_id from
> t);
> max
> -----
>
> (1 row)
>
> test=# select max(id) from t t1 where sts = 0 and not exists (select *
> from t t2 where t2.ref_id = t1.id);
> max
> -----
> 3
> (1 row)
>
> Note that providing minimal setup (like the above) helps in getting
> good answers quickly.
>
> >> do note, this is whats known as an 'anti-join', and these can be pretty
> >> expensive on large tables.
> >
> > +1
>
> *Can* be. Proper indexing can make them very reasonable.
>

​Doh (me)...

Indeed, NOT IN (...) doesn't qualify as an anti-join since (for one) it
cannot (I don't think) be optimized in this way as the entire contents of
the IN() need to be determined. IOW, its not really a join but just
another predicate condition whose one side is a subquery.

That said, writing out a full anti-join NOT EXISTS (or, similarly, a
semi-join EXISTS) clause can be a bit tedious for ad-hoc stuff while the
IN() variation is a bit more succinct and, I'd venture to say,
unfortunately familiar. In can be made to work in this situation by
writing the expression as IN (SELECT ref_id FROM t WHERE ref_id IS NOT
NULL).

David J.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2016-06-02 22:55:30 Re: RowDescription via the SQL?
Previous Message David G. Johnston 2016-06-02 22:10:10 Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3