not exits slow compared to not in. (nested loops killing me)

From: mark <dvlhntr(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: not exits slow compared to not in. (nested loops killing me)
Date: 2011-06-06 20:38:47
Message-ID: BANLkTinjCKm3kC4GemetBmvOcwL5HBwqww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

I am trying to speed up a query on a DB I inherited and I am falling
flat on my face .

I changed a query from NOT IN to use NOT EXISTS and my query time went
from 19000ms to several hours (~50000000 ms). this shocked me so much
I pretty much had to post. This seems like a corner case of the
planner not knowing that the nested-loops are going to turn out badly
in this case. The planner choosing a 13hr nested loop here is
basically the reason I am posting.

I have played around with rewriting this query using some CTEs and a
left join but thus far my results are not encouraging. Given what
little I know , it seems like a LEFT JOIN where right_table.col is
null gets the same performance and estimates as a NOT EXISTS. (and
still picks a nested loop in this case)

I can see where it all goes to hell time wise, turning off nested
loops seems to keep it from running for hours for this query, but not
something I am looking to do globally. The time is not really that
much better than just leaving it alone with a NOT IN.

two queries are at http://pgsql.privatepaste.com/a0b672bab0#

the "pretty" explain versions :

NOT IN (with large work mem - 1GB)
http://explain.depesz.com/s/ukj

NOT IN (with only 64MB for work_mem)
http://explain.depesz.com/s/wT0

NOT EXISTS (with 64MB of work_mem)
http://explain.depesz.com/s/EuX

NOT EXISTS (with nested loop off. and 64MB of work_mem)
http://explain.depesz.com/s/UXG

LEFT JOIN/CTE (with nested loop off and 1GB of work_mem)
http://explain.depesz.com/s/Hwm

table defs, with estimated row counts (which all 100% match exact row count)
http://pgsql.privatepaste.com/c2ff39b653

tried running an analyze across the whole database, no affect.

I haven't gotten creative with explicit join orders yet .

postgresql 9.0.2.

willing to try stuff for people as I can run things on a VM for days
and it is no big deal. I can't do that on production machines.

thoughts ? ideas ?

-Mark

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2011-06-06 21:45:43 8.4/9.0 simple query performance regression
Previous Message Tom Lane 2011-06-06 20:37:51 Re: poor performance when recreating constraints on large tables