From: | "Chuck D(dot)" <pgsql-performance(at)nullmx(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Rewriting DISTINCT and losing performance |
Date: | 2007-05-21 04:28:30 |
Message-ID: | 200705202228.32129.pgsql-performance@nullmx.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi all,
I know we've covered this before but I'm having trouble with it today.
I have some geographic data in tables that I'm working with. I have a
country, state and city table. I was selecting the country_name out of the
country table but discovered that some countries (like Antarctica) didn't
have cities in the city table.
I resolved to query the country table for only country_name's which had
country_id's in the city table - meaning the country had cities listed.
The problem was I had a couple different sources (in separate tables) with
some extraneous column data so I chose to consolidate the city tables from
the different sources and column data that I don't need because I don't have
the hardware to support it.
That was the end of my query time.
Here's the original table and query:
# \d geo.world_city
Table "geo.world_city"
Column | Type | Modifiers
------------+------------------------+-----------
city_id | integer | not null
state_id | smallint |
country_id | smallint |
rc | smallint |
latitude | numeric(9,7) |
longitude | numeric(10,7) |
dsg | character(5) |
cc1 | character(2) |
adm1 | character(2) |
city_name | character varying(200) |
Indexes:
"world_city_pk" PRIMARY KEY, btree (city_id)
"idx_world_city_cc1" btree (cc1)
"idx_world_city_cc1_adm1" btree (cc1, adm1)
"idx_world_city_country_id" btree (country_id)
"idx_world_city_name_first_letter" btree
(state_id, "substring"(lower(city_name::text), 1, 1))
"idx_world_city_state_id" btree (state_id)
explain analyze
SELECT country_id, country_name
FROM geo.country
WHERE country_id IN
(select country_id FROM geo.world_city)
;
QUERY
PLAN
-----------------------------------------------------------------------------
--------------------------------------------------------------------------
Nested Loop IN Join (cost=0.00..167.97 rows=155 width=15) (actual
time=85.502..3479.449 rows=231 loops=1)
-> Seq Scan on country (cost=0.00..6.44 rows=244 width=15) (actual
time=0.089..0.658 rows=244 loops=1)
-> Index Scan using idx_world_city_country_id on world_city
(cost=0.00..8185.05 rows=12602 width=2) (actual time=14.250..14.250 rows=1
loops=244)
Index Cond: (country.country_id = world_city.country_id)
Total runtime: 3479.921 ms
Odd that it took 3 seconds because every previous run has been much quicker.
The next run was:
QUERY
PLAN
-----------------------------------------------------------------------------
------------------------------------------------------------------------
Nested Loop IN Join (cost=0.00..167.97 rows=155 width=15) (actual
time=0.087..6.967 rows=231 loops=1)
-> Seq Scan on country (cost=0.00..6.44 rows=244 width=15) (actual
time=0.028..0.158 rows=244 loops=1)
-> Index Scan using idx_world_city_country_id on world_city
(cost=0.00..8185.05 rows=12602 width=2) (actual time=0.026..0.026 rows=1
loops=244)
Index Cond: (country.country_id = world_city.country_id)
Total runtime: 7.132 ms
(5 rows)
But that was irrelevant. I created a new table and eliminated the data and
it looks like this:
# \d geo.city
Table "geo.city"
Column | Type | Modifiers
------------+------------------------+-----------
city_id | integer | not null
state_id | smallint |
country_id | smallint |
latitude | numeric(9,7) |
longitude | numeric(10,7) |
city_name | character varying(100) |
Indexes:
"city_pk" PRIMARY KEY, btree (city_id)
"idx_city_country_id" btree (country_id) CLUSTER
Foreign-key constraints:
"city_state_id_fk" FOREIGN KEY (state_id) REFERENCES geo.state(state_id)
ON UPDATE CASCADE ON DELETE CASCADE
explain analyze
SELECT country_id, country_name
FROM geo.country
WHERE country_id IN
(select country_id FROM geo.city)
;
-- won't complete in a reasonable amount of time.
This one won't use the country_id index. The two tables have almost the same
number of rows:
cmi=# select count(*) from geo.world_city;
count
---------
1953314
(1 row)
cmi=# select count(*) from geo.city;
count
---------
2122712
(1 row)
I tried to force it and didn't see any improvement. I've vacuummed,
analyzed, clustered. Can someone help me to get only the countries who have
cities in the city table in a reasonable amount of time?
-------------------------------------------------------
From | Date | Subject | |
---|---|---|---|
Next Message | Ralph Mason | 2007-05-21 05:17:42 | Re: Ever Increasing IOWAIT |
Previous Message | Tom Lane | 2007-05-20 21:55:50 | Re: Ever Increasing IOWAIT |