From: | Γιωργος Βαλκανας <lebiathan(at)gmail(dot)com> |
---|---|
To: | Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com> |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: "SELECT .. WHERE NOT IN" query running for hours |
Date: | 2011-01-07 09:29:32 |
Message-ID: | AANLkTimCc6U-xHigzNcjF3xABadg-yKV_1+kTqXRXhnL@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Fair enough!
I also turned seqscan off, so the new plan (for the NOT EXISTS) is:
Merge Anti Join (cost=0.00..212686.89 rows=1 width=313) (actual
time=0.426..14921.344 rows=63836 loops=1)
Merge Cond: ((d2.hwdocid)::text = (d.hwdocid)::text)
-> Index Scan using hwdocid2_uniq on "Doc2" d2 (cost=0.00..19442.87
rows=96454 width=313) (actual time=0.130..1248.783 rows=96454 loops=1)
-> Index Scan using hwdocid_uniq on "Document" d (cost=0.00..189665.17
rows=949272 width=12) (actual time=0.085..11158.740 rows=948336 loops=1)
Total runtime: 15062.925 ms
Hmm.. doesn't really seem to be such a great boost on performance. But i
guess I'll be sticking to this one.
So my follow-up question on the subject is this:
Are there any particular semantics for the "NOT IN" statement that cause the
correlated query to execute for every row of the outter query, as opposed to
the "NOT EXISTS" ? Or are there any other practical reasons, related to "IN
/ NOT IN", for this to be happening? Or is it simply due to implementation
details of each RDBMS? I guess the former (or the 2nd one), since, as you
say, this is common in most databases, but I would most appreciate an answer
to clarify this.
Thanks again!
Best regards,
George
2011/1/7 Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
> On 1/6/2011 9:36 PM, Γιωργος Βαλκανας wrote:
>
>>
>> 1) Why is it taking *so* long for the first query (with the "NOT IN" ) to
>> do even the simple select?
>>
> Because NOT IN has to execute the correlated subquery for every row and
> then check whether the requested value is in the result set, usually by
> doing sequential comparison. The NOT EXIST plan is also bad because there is
> no index but at least it can use very fast and efficient hash algorithm.
> Indexing the "hwdocid" column on the "Document" table or, ideally, making it
> a primary key, should provide an additional boost to your query. If you
> already do have an index, you may consider using enable_seqscan=false for
> this session, so that the "hwdocid" index will be used. It's a common wisdom
> that in the most cases NOT EXISTS will beat NOT IN. That is so all over the
> database world. I've seen that in Oracle applications, MS SQL applications
> and, of course MySQL applications. Optimizing queries is far from trivial.
>
> Μλαδεν Γογαλα
>
> --
> Mladen Gogala
> Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> www.vmsinfo.com
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Florian Weimer | 2011-01-07 12:45:25 | Wrong docs on checkpoint_segments? |
Previous Message | marc.hsiao | 2011-01-07 08:14:41 | Re: How to turn autovacuum prevent wrap around run faster? |