Re: Have I b0rked something? Slow comparisons on "where x in (...)"

From: Listmail <lists(at)peufeu(dot)com>
To: "Richard Huxton" <dev(at)archonet(dot)com>, "Stephen Harris" <lists(at)spuddy(dot)org>
Cc: "Dann Corbit" <DCorbit(at)connx(dot)com>, "Postgres General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Have I b0rked something? Slow comparisons on "where x in (...)"
Date: 2007-05-03 17:05:01
Message-ID: op.trrgen13zcizji@apollo13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


> Try creating a temporary table, populating with the list and joining
> against it. That's probably your best bet for a long list of target
> values.

Check :

forum_bench=> CREATE TABLE test (value INTEGER NOT NULL);
CREATE TABLE
forum_bench=> INSERT INTO test SELECT * FROM generate_series( 1, 1000000 );
INSERT 0 1000000
forum_bench=> ANALYZE test;
forum_bench=> EXPLAIN ANALYZE SELECT * FROM test;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..14405.24 rows=999924 width=4) (actual
time=0.030..349.699 rows=1000000 loops=1)
Total runtime: 542.914 ms
(2 lignes)

OK : 542 ms to grab the data.
IN() :

EXPLAIN ANALYZE SELECT * FROM test WHERE value IN ( 1000 values from 0 to
999000 in steps of 1000 ):
Seq Scan on test (cost=0.00..1264310.24 rows=1000 width=4) (actual
time=17.649..17977.085 rows=999 loops=1)
Filter: (value = ANY ('{0,1000..........99000}'::integer[]))
Total runtime: 17978.061 ms

Ouch.

forum_bench=> EXPLAIN ANALYZE SELECT * FROM test WHERE value IN (VALUES
(0),(1000),(2000),....................(998000),(999000));
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=19.50..18176.45 rows=200 width=4) (actual
time=2.823..736.960 rows=999 loops=1)
Hash Cond: (test.value = "*VALUES*".column1)
-> Seq Scan on test (cost=0.00..14405.24 rows=999924 width=4) (actual
time=0.032..335.680 rows=1000000 loops=1)
-> Hash (cost=17.00..17.00 rows=200 width=4) (actual
time=2.108..2.108 rows=1000 loops=1)
-> HashAggregate (cost=15.00..17.00 rows=200 width=4) (actual
time=1.165..1.542 rows=1000 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..12.50 rows=1000
width=4) (actual time=0.004..0.478 rows=1000 loops=1)
Total runtime: 737.362 ms

Removing the 542 ms to read the table, we see checking if the values are
in the hash is really rally fast.

So, obvious truth : hash is faster than dumb compare. Much faster.
Now, postgres should do this on its own, I think.

PS : if the 1000 values are all the same (1000 times 1), IN() doesn't
detect it, so the runtime does not change. Hash join doesn't care, so the
runtime doesn't change either.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Listmail 2007-05-03 17:12:52 Re: Have I b0rked something? Slow comparisons on "where x in (...)"
Previous Message Islam Hegazy 2007-05-03 16:34:27 Re: C functions under windows