From: | Andrew - Supernews <andrew+nonews(at)supernews(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: slow IN() clause for many cases |
Date: | 2005-10-12 09:40:45 |
Message-ID: | slrndkpmgt.2db7.andrew+nonews@trinity.supernews.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2005-10-11, "Ilia Kantor" <ilia(at)obnovlenie(dot)ru> wrote:
> When in clause becomes large enough (>20-30 cases),
> It is much better to use "join" way of processing..
or even a different way of writing the IN clause.
This one is one I've used after considerable research:
select * from table
where field in (select (some_array_of_N_items)[i]
from generate_series(1,N) as s(i));
This generally plans as a nestloop, with a HashAggregate of the function
scan (of generate_series) on the outer path, and index lookups on the inner
path.
It's worth noting that EXPLAIN ANALYZE doesn't tell the whole story when
comparing queries of this kind. The IN (1,2,...30) form is much slower to
plan, and usually can't usefully be used in prepared form (you'd need to
prepare it separately for every different number of items); in contrast,
the array version can be prepared once and reused.
As the number of items in the IN clause increases, the planning time grows
rather radically. For example with 1000 items (here stashed in a psql
convenience variable for brevity), using 8.1beta2:
test=# prepare tstplan1 as select * from test where id in (:v);
Time: 4881.702 ms
compare:
test=# prepare tstplan2 as select * from test where id in
(select (ARRAY[:v])[i] from generate_series(1,1000) s(i));
Time: 10.889 ms
(on my machine the break-even point for these two is less than 20 items,
or even less if the array is passed in as a literal or a single parameter
rather than constructed with ARRAY[].)
The actual execution time of these two is very close, with the second
being about 10% slower on my system (31ms vs 34ms, based on \timing values
from psql and averaged over several goes). However, the timings returned
from EXPLAIN ANALYZE are much more skewed: 42ms vs 66ms as reported in the
"total runtime" line. So not only is the planning time different, but also
the instrumentation overhead of EXPLAIN ANALYZE is wildly different between
the two forms.
What this means is that unless you're going to prepare in advance every
possible number of parameters to IN that your app is ever going to use,
the only way to get useful performance for IN queries with more than a
handful of literal values is to use an array method, in spite of the fact
that the bitmap-OR execution plan is actually at least as fast.
--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2005-10-12 11:21:07 | Re: Socket problem using beta2 on Windows-XP |
Previous Message | Thomas Hallgren | 2005-10-12 08:41:12 | Re: Socket problem using beta2 on Windows-XP |