Quick Links

Re: SQL WHERE: many sql or large IN()

From:	tom <tom(at)tacocat(dot)net>
To:	Postgresql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: SQL WHERE: many sql or large IN()
Date:	2007-04-07 00:48:22
Message-ID:	6140957A-5C19-4880-828B-6CB8391433C8@tacocat.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

This is some good stuff and I can use the explain analyze going forward.
But I can't get these VALUES queries to work.

I checked and I am on version 8.1. but I think from the docs that I
should still be able to do this.

queue=> select t.* from test t, (values(4, 23,84884,1,324234)) as v
where t.value = v.column1;
ERROR: syntax error at or near ")" at character 54
LINE 1: ...ect t.* from test t, (values(4, 23,84884,1,324234)) as v
whe...

^
Not sure if the carat is in the right place on this email, but it's
the second ')' that's the problem.

On Apr 6, 2007, at 2:26 PM, Listmail wrote:

> On Fri, 06 Apr 2007 18:45:15 +0200, Markus Schiltknecht
> <markus(at)bluegap(dot)ch> wrote:
>
>> Hi,
>>
>> tom wrote:
>>> Initially it seems that the WHERE IN (...) approach takes a turn
>>> for the worse when the list gets very large.
>>
>
> Since I use this a lot on webpages, I thought maybe a little
> benchmark is in order ?
>
> CREATE TABLE test (id SERIAL PRIMARY KEY, value TEXT NOT NULL );
> INSERT INTO test (value) SELECT * FROM generate_series( 1, 1000000 );
> CREATE INDEX test_value ON test( value );
> ANALYZE test;
>
> My script runs EXPLAIN ANALYZE 10 times and keeps the fastest
> timing, then displays it.
> Then it executes the query enough times so that 1 second is
> elapsed (also fetching the results), then prints the real runtime.
>
> Well, the table is clustered (since it was created from
> generate_series) so of course, bitmap index scan rules.
> Get the first 300 rows of the table using column 'value' which is
> an indexed TEXT column.
> (using 'id' which is an integer is a bit faster)
>
> SELECT * FROM test WHERE value IN <300 values>
> Bitmap Heap Scan on test (cost=1225.83..2317.08 rows=300 width=13)
> (actual time=3.736..3.807 rows=300 loops=1)
> Recheck Cond: (value = ANY ('{<300 values>}'::text[]))
> -> Bitmap Index Scan on test_value (cost=0.00..1225.75 rows=300
> width=0) (actual time=3.717..3.717 rows=300 loops=1)
> Index Cond: (value = ANY ('{<300 values>}'::text[]))
> Explain Analyze runtime: 1 x 3.896 ms = 3.896 ms
> Real runtime: 1 x 6.027 ms = 6.027 ms (timed on 257 iterations)
> ----------------------------------------
> SELECT t.* FROM test t, ( VALUES <300 values> ) AS v WHERE
> t.value=v.column1
> Nested Loop (cost=0.00..2447.27 rows=300 width=13) (actual
> time=0.035..4.724 rows=300 loops=1)
> -> Values Scan on "*VALUES*" (cost=0.00..3.75 rows=300
> width=32) (actual time=0.002..0.219 rows=300 loops=1)
> -> Index Scan using test_value on test t (cost=0.00..8.13
> rows=1 width=13) (actual time=0.013..0.014 rows=1 loops=300)
> Index Cond: (t.value = "*VALUES*".column1)
> Explain Analyze runtime: 1 x 4.814 ms = 4.814 ms
> Real runtime: 1 x 6.786 ms = 6.786 ms (timed on 208 iterations)
> ----------------------------------------
> SELECT * FROM test WHERE value='1'
> Index Scan using test_value on test (cost=0.00..8.40 rows=1
> width=13) (actual time=0.014..0.015 rows=1 loops=1)
> Index Cond: (value = '1'::text)
> Explain Analyze runtime: 300 x 0.032 ms = 9.600 ms
> Real runtime: 300 x 0.149 ms = 44.843 ms (timed on 31251 iterations)
>
> Now if we ask for 300 random rows out of the million in the table,
> which is a lot more likely situation...
>
> SELECT * FROM test WHERE value IN <300 values>
> Bitmap Heap Scan on test (cost=1225.83..2317.08 rows=300 width=13)
> (actual time=4.516..4.945 rows=300 loops=1)
> Recheck Cond: (value = ANY ('{<300 values>}'::text[]))
> -> Bitmap Index Scan on test_value (cost=0.00..1225.75 rows=300
> width=0) (actual time=4.451..4.451 rows=300 loops=1)
> Index Cond: (value = ANY ('{<300 values>}'::text[]))
> Explain Analyze runtime: 1 x 5.034 ms = 5.034 ms
> Real runtime: 1 x 7.278 ms = 7.278 ms (timed on 199 iterations)
> ----------------------------------------
> SELECT t.* FROM test t, ( VALUES <300 values> ) AS v WHERE
> t.value=v.column1
> Nested Loop (cost=0.00..2447.27 rows=300 width=13) (actual
> time=0.046..5.503 rows=300 loops=1)
> -> Values Scan on "*VALUES*" (cost=0.00..3.75 rows=300
> width=32) (actual time=0.001..0.200 rows=300 loops=1)
> -> Index Scan using test_value on test t (cost=0.00..8.13
> rows=1 width=13) (actual time=0.016..0.016 rows=1 loops=300)
> Index Cond: (t.value = "*VALUES*".column1)
> Explain Analyze runtime: 1 x 5.625 ms = 5.625 ms
> Real runtime: 1 x 7.572 ms = 7.572 ms (timed on 178 iterations)
>
> Doing a query per row is a lot slower than it appears in EXPLAIN
> ANALYZE when you add all the overhead. In fact it sucks. That was
> ecpected of course. I could tell you about when I got this php/
> mysql website where the previous developer's idea of a join was a
> select in a PHP for() loop because "joins are slow". It was hell.
>
> I tried to query 1, 2, 10, 1000, 10k and 100k random rows (thats
> abuse) and :
> - one query per row is (obviously) the slowest unless you select
> only one row
> - IN reverts to a single index scan if it contains one value, else
> it always uses a bitmap scan
> - VALUES join uses nested loop until about 100k rows where it
> switches to hash join
>
> I also tried on a real table with many columns. In that case,
> there are less tuples per page, and bitmap scans are more efficient
> than nested loops, so IN wins.
> So, IN() does not turn to crap anymore like it used to !
>
> However, if you are asking for this question because you don't
> want to use temporary tables, it seems that temp tables have been
> upgraded so they are now usable even in a webpage (well, not ALL
> webpages, but that big nasty slow webpage, you know which one I'm
> talking about).
>
>
>
>
>
>
>
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

In response to

Re: SQL WHERE: many sql or large IN() at 2007-04-06 18:26:17 from Listmail

Responses

Re: SQL WHERE: many sql or large IN() at 2007-04-20 07:56:39 from Chris

Browse pgsql-general by date

	From	Date	Subject
Next Message	Schwenker, Stephen	2007-04-07 01:36:07	Re: 8.2.3 AutoVacuum not running
Previous Message	John D. Burger	2007-04-07 00:03:42	New to concurrency