From: | Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2 |
Date: | 2011-01-18 08:06:03 |
Message-ID: | 201101181006.03790.achill@matrix.gatewaynet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Thanx,
Στις Monday 17 January 2011 18:52:27 ο/η Ing. Marcos Ortiz Valmaseda έγραψε:
>
> Well, on the Release Notes on the PostgreSQL-8.4 Documentation, the developers recommend to use NOT EXISTS
> instead NOT IN, because the first clause has a better performance. So, you can use it on that way.
>
You mean this?
(from 8.4 changes)
"Create explicit concepts of semi-joins and anti-joins (Tom)
This work formalizes our previous ad-hoc treatment of IN (SELECT
...) clauses, and extends it to EXISTS and NOT EXISTS clauses. It
should result in significantly better planning of EXISTS and NOT
EXISTS queries. In general, logically equivalent IN and EXISTS
clauses should now have similar performance, whereas previously IN
often won."
I haven't found any other recent reference to this issue. And this is far from what you suggest.
Here the entry talks about "similar" performance.
Also a similar issue was hot back in 7.4 days :
"IN / NOT IN subqueries are now much more efficient
In previous releases, IN/NOT IN subqueries were joined to the
upper query by sequentially scanning the subquery looking for a
match. The 7.4 code uses the same sophisticated techniques used
by ordinary joins and so is much faster. An IN will now usually
be as fast as or faster than an equivalent EXISTS subquery; this
reverses the conventional wisdom that applied to previous
releases."
> Other questions?
> - Do you have a partial index on marinerstates.marinerid where this condition is accomplished?
No, but i just tried it (on state='Active') with no impact.
> - Do you have a index on mariner.id?
Yes, It is the primary key.
> - Can you provide a explain of these queries on the PostgreSQL-9.0 machines?
Sure, first i'll post the table definitions and then some stats and then the epxlain analyze(s)
mariner
=====
id | integer | not null default nextval(('public.mariner_id_seq'::text)::regclass)
givenname | character varying(200) |
midname | character varying(100) |
surname | character varying(200) | not null
...
Indexes:
"mariner_pkey" PRIMARY KEY, btree (id)
"mariner_smauid" UNIQUE, btree (smauid)
"mariner_username_key" UNIQUE, btree (username)
"mariner_nationalityid" btree (nationalityid)
"mariner_parentid" btree (parentid)
"mariner_surname" btree (surname)
marinerstates
========
id | integer | not null default nextval(('public.marinerstates_id_seq'::text)::regclass)
marinerid | integer | not null
state | character varying(20) | not null
vslid | integer |
leave_period_days | integer |
comment | text |
starttime | timestamp with time zone | not null
endtime | timestamp with time zone |
trid | integer |
sal_bw | real | not null default 0.0
sal_ot | real | not null default 0.0
sal_lp | real | not null default 0.0
sal_misc | real | not null default 0.0
rankid | integer |
system_vslid | integer |
startport | text |
endport | text |
.....
Indexes:
"marinerstates_pkey" PRIMARY KEY, btree (id)
"marinerstates_mariner_cur_state" UNIQUE, btree (marinerid) WHERE endtime IS NULL
"marinerstates_system_vslid" UNIQUE, btree (marinerid, system_vslid)
"marinerstates__system_vslid" btree (system_vslid)
"marinerstates_cur_mariners_states" btree (endtime) WHERE endtime IS NULL
"marinerstates_mariner_past_state" btree (marinerid, starttime, endtime) WHERE endtime IS NOT NULL
"marinerstates_marinerid" btree (marinerid)
"marinerstates_marinerid_starttime" btree (marinerid, starttime)
"marinerstates_rankid" btree (rankid)
"marinerstates_rankid_cur_mariners" btree (rankid) WHERE endtime IS NULL
"marinerstates_rankid_past_state" btree (rankid, starttime, endtime) WHERE endtime IS NOT NULL
"marinerstates_state" btree (state)
"marinerstates_state_cur_mariners" btree (state) WHERE endtime IS NULL
"marinerstates_state_past_state" btree (state, starttime, endtime) WHERE endtime IS NOT NULL
"marinerstates_vslid" btree (vslid)
"marinerstates_vslid_cur_mariners" btree (vslid) WHERE endtime IS NULL
"marinerstates_vslid_past_state" btree (vslid, starttime, endtime) WHERE endtime IS NOT NULL
vessels
=====
name | character varying(200) | not null
id | integer | not null default nextval(('public.vessels_id_seq'::text)::regclass)
...
Indexes:
"vessels_pkey" PRIMARY KEY, btree (id)
"vessels_name_key" UNIQUE, btree (name)
"idx_name" btree (name)
"vessels_flag" btree (flag)
"vessels_groupno" btree (groupno)
"vessels_vslstatus_idx" btree (vslstatus)
dynacom=# SELECT count(*) from mariner;
count
-------
14447
dynacom=# SELECT count(*) from marinerstates;
count
-------
51013
dynacom=# SELECT avg(marqry.cnt),stddev(marqry.cnt) FROM (SELECT m.id,count(ms.id) as cnt from mariner m, marinerstates ms WHERE m.id=ms.marinerid group by m.id) AS marqry;
avg | stddev
--------------------+--------------------
3.5665944207508914 | 4.4416879361829170
(vessels do not play any impact in the query, so i'll leave them out)
Slow plan in 9.0.2 :
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=11525.09..11571.55 rows=3717 width=23) (actual time=10462.561..10462.937 rows=603 loops=1)
-> Sort (cost=11525.09..11534.38 rows=3717 width=23) (actual time=10462.560..10462.664 rows=603 loops=1)
Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)), m.id
Sort Method: quicksort Memory: 71kB
-> Hash Join (cost=8281.98..11304.67 rows=3717 width=23) (actual time=10425.261..10461.621 rows=603 loops=1)
Hash Cond: (ms.marinerid = m.id)
-> Hash Join (cost=20.12..2963.83 rows=3717 width=4) (actual time=0.228..34.993 rows=2625 loops=1)
Hash Cond: (ms.vslid = vsl.id)
-> Seq Scan on marinerstates ms (cost=0.00..2889.32 rows=4590 width=8) (actual time=0.011..33.494 rows=2625 loops=1)
Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date))
-> Hash (cost=16.72..16.72 rows=272 width=4) (actual time=0.207..0.207 rows=272 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Seq Scan on vessels vsl (cost=0.00..16.72 rows=272 width=4) (actual time=0.004..0.118 rows=272 loops=1)
-> Hash (cost=8172.57..8172.57 rows=7143 width=23) (actual time=10424.994..10424.994 rows=12832 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 702kB
-> Seq Scan on mariner m (cost=7614.86..8172.57 rows=7143 width=23) (actual time=10409.498..10419.971 rows=12832 loops=1)
Filter: ((NOT (hashed SubPlan 1)) AND ((marinertype)::text = 'Mariner'::text))
SubPlan 1
-> Unique (cost=2768.00..7614.86 rows=1 width=4) (actual time=87.495..10408.446 rows=1454 loops=1)
-> Nested Loop (cost=2768.00..7614.86 rows=1 width=4) (actual time=87.493..10407.517 rows=1835 loops=1)
Join Filter: (msold.marinerid = mold.id)
-> Index Scan using mariner_pkey on mariner mold (cost=0.00..1728.60 rows=14286 width=4) (actual time=0.007..13.931 rows=14286 loops=1)
Filter: ((marinertype)::text = 'Mariner'::text)
-> Materialize (cost=2768.00..5671.97 rows=1 width=8) (actual time=0.003..0.330 rows=1876 loops=14286)
-> Nested Loop (cost=2768.00..5671.96 rows=1 width=8) (actual time=39.723..85.401 rows=1876 loops=1)
-> Hash Semi Join (cost=2768.00..5671.67 rows=1 width=12) (actual time=39.708..81.501 rows=1876 loops=1)
Hash Cond: (msold.marinerid = msold2.marinerid)
Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime < msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval))
-> Seq Scan on marinerstates msold (cost=0.00..2889.32 rows=4590 width=20) (actual time=0.003..33.952 rows=2625 loops=1)
Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date))
-> Hash (cost=2251.66..2251.66 rows=41307 width=24) (actual time=39.613..39.613 rows=41250 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 2246kB
-> Seq Scan on marinerstates msold2 (cost=0.00..2251.66 rows=41307 width=24) (actual time=0.002..24.882 rows=41250 loops=1)
Filter: ((state)::text = 'Active'::text)
-> Index Scan using vessels_pkey on vessels vslold (cost=0.00..0.28 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1876)
Index Cond: (vslold.id = msold.vslid)
Total runtime: 10463.619 ms
(37 rows)
Fast plan in 8.3.13 :
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=633677.56..633700.48 rows=1834 width=23) (actual time=543.684..551.003 rows=603 loops=1)
-> Sort (cost=633677.56..633682.14 rows=1834 width=23) (actual time=543.676..546.070 rows=603 loops=1)
Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)), m.id
Sort Method: quicksort Memory: 53kB
-> Hash Join (cost=630601.65..633578.15 rows=1834 width=23) (actual time=439.969..540.573 rows=603 loops=1)
Hash Cond: (ms.vslid = vsl.id)
-> Hash Join (cost=630580.33..633530.01 rows=2261 width=27) (actual time=437.459..532.847 rows=603 loops=1)
Hash Cond: (ms.marinerid = m.id)
-> Seq Scan on marinerstates ms (cost=0.00..2875.32 rows=4599 width=8) (actual time=0.017..80.153 rows=2625 loops=1)
Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date))
-> Hash (cost=630491.54..630491.54 rows=7103 width=23) (actual time=437.307..437.307 rows=12832 loops=1)
-> Index Scan using mariner_pkey on mariner m (cost=628776.89..630491.54 rows=7103 width=23) (actual time=311.023..380.168 rows=12832 loops=1)
Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text))
SubPlan
-> Unique (cost=0.00..628772.30 rows=1834 width=4) (actual time=0.129..303.981 rows=1454 loops=1)
-> Nested Loop (cost=0.00..628767.72 rows=1834 width=4) (actual time=0.120..289.961 rows=1835 loops=1)
-> Nested Loop (cost=0.00..627027.98 rows=1865 width=4) (actual time=0.099..237.128 rows=1876 loops=1)
-> Index Scan using marinerstates_marinerid on marinerstates msold (cost=0.00..626316.07 rows=2299 width=8) (actual time=0.079..186.150 rows=1876 loops=1)
Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date) AND (subplan))
SubPlan
-> Bitmap Heap Scan on marinerstates msold2 (cost=4.28..12.11 rows=1 width=0) (actual time=0.020..0.020 rows=1 loops=2625)
Recheck Cond: ((marinerid = $0) AND (starttime < $2))
Filter: ((id <> $1) AND ((state)::text = 'Active'::text) AND (($2 - endtime) <= '1 year 6 mons'::interval))
-> Bitmap Index Scan on marinerstates_marinerid_starttime (cost=0.00..4.28 rows=2 width=0) (actual time=0.009..0.009 rows=6 loops=2625)
Index Cond: ((marinerid = $0) AND (starttime < $2))
-> Index Scan using vessels_pkey on vessels vslold (cost=0.00..0.30 rows=1 width=4) (actual time=0.006..0.010 rows=1 loops=1876)
Index Cond: (vslold.id = msold.vslid)
-> Index Scan using mariner_pkey on mariner mold (cost=0.00..0.92 rows=1 width=4) (actual time=0.007..0.012 rows=1 loops=1876)
Index Cond: (mold.id = msold.marinerid)
Filter: ((mold.marinertype)::text = 'Mariner'::text)
-> Hash (cost=17.81..17.81 rows=281 width=4) (actual time=2.491..2.491 rows=273 loops=1)
-> Seq Scan on vessels vsl (cost=0.00..17.81 rows=281 width=4) (actual time=0.012..1.306 rows=273 loops=1)
Total runtime: 553.601 ms
(33 rows)
Is there any other data i could post (pg_stat,...) that would help?
thanx a lot.
>
> Regards
>
>
> Ing. Marcos Luís Ortíz Valmaseda
> Linux User # 418229 && PostgreSQL DBA
> Centro de Tecnologías Gestión de Datos (DATEC)
> http://postgresql.uci.cu
> http://www.postgresql.org
> http://it.toolbox.com/blogs/sql-apprentice
>
--
Achilleas Mantzios
From | Date | Subject | |
---|---|---|---|
Next Message | Devrim GÜNDÜZ | 2011-01-18 08:43:00 | Re: The good, old times |
Previous Message | Fernando Hevia | 2011-01-17 23:42:44 | Re: Problem with query |