From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Adam Tistler <atistler(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Select in subselect vs select = any array |
Date: | 2011-03-21 05:54:53 |
Message-ID: | AANLkTimTp__+9prmEuFh9q9rfT=La2Vhf172_bz=hEa5@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello
I think so HashAggregate goes out of memory - you can try to increase
a work_mem.
There are better queries for counting duplicit then cross join
Regards
Pavel Stehule
2011/3/21 Adam Tistler <atistler(at)gmail(dot)com>:
> logicops2=# explain analyze select count(*) from nodes where node_id = any( Array(select node_id from nodes limit 100000) );
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=1718.59..1718.60 rows=1 width=0) (actual time=509.126..509.127 rows=1 loops=1)
> InitPlan 1 (returns $0)
> -> Limit (cost=0.00..1637.04 rows=100000 width=4) (actual time=0.010..76.604 rows=100000 loops=1)
> -> Seq Scan on nodes (cost=0.00..12355.41 rows=754741 width=4) (actual time=0.008..38.105 rows=100000 loops=1)
> -> Bitmap Heap Scan on nodes (cost=42.67..81.53 rows=10 width=0) (actual time=447.274..484.283 rows=100000 loops=1)
> Recheck Cond: (node_id = ANY ($0))
> -> Bitmap Index Scan on n_node_id_index (cost=0.00..42.67 rows=10 width=0) (actual time=447.074..447.074 rows=100000 loops=1)
> Index Cond: (node_id = ANY ($0))
> Total runtime: 509.209 ms
> (9 rows)
>
> Time: 510.009 ms
>
>
> logicops2=# explain analyze select count(*) from nodes where node_id in (select node_id from nodes limit 100000);
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=3017.17..3017.18 rows=1 width=0) (actual time=1052.866..1052.866 rows=1 loops=1)
> -> Nested Loop (cost=2887.04..3016.67 rows=200 width=0) (actual time=167.310..1021.540 rows=100000 loops=1)
> -> HashAggregate (cost=2887.04..2889.04 rows=200 width=4) (actual time=167.198..251.205 rows=100000 loops=1)
> -> Limit (cost=0.00..1637.04 rows=100000 width=4) (actual time=0.008..80.090 rows=100000 loops=1)
> -> Seq Scan on nodes (cost=0.00..12355.41 rows=754741 width=4) (actual time=0.007..41.566 rows=100000 loops=1)
> -> Index Scan using n_node_id_index on nodes (cost=0.00..0.63 rows=1 width=4) (actual time=0.006..0.007 rows=1 loops=100000)
> Index Cond: (public.nodes.node_id = public.nodes.node_id)
> Total runtime: 1053.523 ms
> (8 rows)
>
> Time: 1054.864 ms
>
>
>
> On Mar 20, 2011, at 2:51 AM, Pavel Stehule wrote:
>
>> Hello
>>
>> 2011/3/20 Adam Tistler <atistler(at)gmail(dot)com>:
>>> I have noticed that SELECT ... = ANY(ARRAY(...)) is about twice as fast as SELECT IN ( ... ).
>>> Can anyone explain a reason for this? Results are the bottom and are reproducible. I can test with other versions if that is necessary.
>>>
>>
>> send a result of EXPLAIN ANALYZE SELECT ..., please
>>
>> The reasons can be different - less seq scans, indexes
>>
>> Regards
>>
>> Pavel Stehule
>>
>>
>>
>>> ./configure --prefix=/usr/local/pgsql84 --with-openssl --with-perl
>>> CentOS release 5.4 (Final)
>>> psql (PostgreSQL) 8.4.1
>>>
>>> prompt2=# select count(*) from nodes;
>>> count
>>> --------
>>> 754734
>>> (1 row)
>>>
>>>
>>> prompt2=# \d nodes
>>> Table "public.nodes"
>>> Column | Type | Modifiers
>>> --------------+--------------------------+-----------------------------------------------------------
>>> node_id | integer | not null default nextval(('node_id_seq'::text)::regclass)
>>> node_type_id | integer | not null
>>> template_id | integer | not null
>>> timestamp | timestamp with time zone | default ('now'::text)::timestamp(6) with time zone
>>> Indexes:
>>> "nodes_pkey" PRIMARY KEY, btree (node_id)
>>> "n_node_id_index" btree (node_id)
>>> "n_node_type_id_index" btree (node_type_id)
>>> "n_template_id_index" btree (template_id)
>>>
>>> prompt2=# select count(*) from nodes where node_id = any( Array(select node_id from nodes limit 100000) );
>>> count
>>> --------
>>> 100000
>>> (1 row)
>>>
>>> Time: 404.530 ms
>>> prompt2=# select count(*) from nodes where node_id = any( Array(select node_id from nodes limit 100000) );
>>> count
>>> --------
>>> 100000
>>> (1 row)
>>>
>>> Time: 407.316 ms
>>> prompt2=# select count(*) from nodes where node_id = any( Array(select node_id from nodes limit 100000) );
>>> count
>>> --------
>>> 100000
>>> (1 row)
>>>
>>> Time: 408.728 ms
>>> prompt2=# select count(*) from nodes where node_id in (select node_id from nodes limit 100000 );
>>> count
>>> --------
>>> 100000
>>> (1 row)
>>>
>>> Time: 793.840 ms
>>> prompt2=# select count(*) from nodes where node_id in (select node_id from nodes limit 100000 );
>>> count
>>> --------
>>> 100000
>>> (1 row)
>>>
>>> Time: 779.137 ms
>>> prompt2=# select count(*) from nodes where node_id in (select node_id from nodes limit 100000 );
>>> count
>>> --------
>>> 100000
>>> (1 row)
>>>
>>> Time: 781.820 ms
>>>
>>>
>>> --
>>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>>>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Adam Tistler | 2011-03-21 06:16:56 | Re: Select in subselect vs select = any array |
Previous Message | Adam Tistler | 2011-03-21 03:20:56 | Re: Select in subselect vs select = any array |