Quick Links

Tuning/performance question.

From:	David Griffiths <dgriffiths(at)boats(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Tuning/performance question.
Date:	2003-09-28 03:49:23
Message-ID:	01d201c38573$84606930$6501a8c0@griffiths2
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-performance

We are doing some performance testing among various databases (Oracle, MySQL
and Postgres).

One of the queries is showing Postgres lagging quite a bit:

SELECT count(*)
FROM commercial_entity, country, user_account, address_list
LEFT JOIN state_province ON address_list.state_province_id =
state_province.state_province_id
LEFT JOIN contact_info ON address_list.contact_info_id =
contact_info.contact_info_id
WHERE address_list.address_type_id = 101
AND commercial_entity.commercial_entity_id =
address_list.commercial_entity_id
AND address_list.country_id = country.country_id
AND commercial_entity.user_account_id = user_account.user_account_id
AND user_account.user_role_id IN (101, 101);

I ran a "vacuum analyze" after realizing that I had loaded all the data into
the database without redoing the statistics; the query jumped from 19
seconds to 41 seconds _after_ the analyze.

I'd also like to make sure my query is performing correctly - I want all the
count of records where the commercial_entity matches user_account,
address_list, country, and a left-outer-join on address_list-province and
address_list-contact_info.

Finally, I read some posts on the shared_buffers; they stated that the
shared_buffers should be set to 1/4 to 1/5 of total memory available. Is
that correct? I give the MySQL/InnoDB buffers about 70% of the 2 gig on the
machine.

Here's the explain (I'm not too familiar with reading a Postgres
explain...):

----------------------------------------------------------------------------
----------------------------------------------------------------
Aggregate (cost=52951.09..52951.09 rows=1 width=116)
-> Merge Join (cost=52941.61..52950.83 rows=105 width=116)
Merge Cond: ("outer".country_id = "inner".country_id)
-> Index Scan using country_pkey on country (cost=0.00..7.54
rows=231 width=11)
-> Sort (cost=52941.61..52941.88 rows=105 width=105)
Sort Key: address_list.country_id
-> Merge Join (cost=52729.54..52938.07 rows=105 width=105)
Merge Cond: ("outer".commercial_entity_id =
"inner".commercial_entity_id)
-> Sort (cost=8792.01..8792.52 rows=201 width=36)
Sort Key: commercial_entity.commercial_entity_id
-> Nested Loop (cost=0.00..8784.31 rows=201
width=36)
-> Index Scan using usr_acc_usr_role_id_i
on user_account (cost=0.00..2403.08 rows=1401 width=12)
Index Cond: (user_role_id =
101::numeric)
-> Index Scan using comm_ent_usr_acc_id_i
on commercial_entity (cost=0.00..4.54 rows=1 width=24)
Index Cond:
(commercial_entity.user_account_id = "outer".user_account_id)
-> Sort (cost=43937.53..44173.84 rows=94526 width=69)
Sort Key: address_list.commercial_entity_id
-> Merge Join (cost=29019.03..32585.73
rows=94526 width=69)
Merge Cond: ("outer".contact_info_id =
"inner".contact_info_id)
-> Index Scan using contact_info_pkey on
contact_info (cost=0.00..3366.76 rows=56435 width=12)
-> Sort (cost=29019.03..29255.34
rows=94526 width=57)
Sort Key:
address_list.contact_info_id
-> Merge Join
(cost=16930.18..18354.55 rows=94526 width=57)
Merge Cond:
("outer".state_province_id = "inner".state_province_id)
-> Index Scan using
state_province_pkey on state_province (cost=0.00..3.81 rows=67 width=11)
-> Sort
(cost=16930.18..17166.50 rows=94526 width=46)
Sort Key:
address_list.state_province_id
-> Seq Scan on
address_list (cost=0.00..6882.52 rows=94526 width=46)
Filter:
(address_type_id = 101::numeric)

What's the "Sort (cost...)"?

I noticed that joining the address_list to country was slow; there was no
index on just country_id; there were composite indexes on multiple columns,
so I added one and did a vacuum analyze on the table, and got:

Aggregate (cost=54115.74..54115.74 rows=1 width=116)
-> Merge Join (cost=54105.91..54115.46 rows=109 width=116)
Merge Cond: ("outer".country_id = "inner".country_id)
-> Index Scan using country_pkey on country (cost=0.00..7.54
rows=231 width=11)
-> Sort (cost=54105.91..54106.19 rows=110 width=105)
Sort Key: address_list.country_id
-> Merge Join (cost=53884.34..54102.18 rows=110 width=105)
Merge Cond: ("outer".commercial_entity_id =
"inner".commercial_entity_id)
-> Sort (cost=8792.01..8792.52 rows=201 width=36)
Sort Key: commercial_entity.commercial_entity_id
-> Nested Loop (cost=0.00..8784.31 rows=201
width=36)
-> Index Scan using usr_acc_usr_role_id_i
on user_account (cost=0.00..2403.08 rows=1401 width=12)
Index Cond: (user_role_id =
101::numeric)
-> Index Scan using comm_ent_usr_acc_id_i
on commercial_entity (cost=0.00..4.54 rows=1 width=24)
Index Cond:
(commercial_entity.user_account_id = "outer".user_account_id)
-> Sort (cost=45092.32..45335.37 rows=97221 width=69)
Sort Key: address_list.commercial_entity_id
-> Merge Join (cost=29770.81..33338.09
rows=97221 width=69)
Merge Cond: ("outer".contact_info_id =
"inner".contact_info_id)
-> Index Scan using contact_info_pkey on
contact_info (cost=0.00..3366.76 rows=56435 width=12)
-> Sort (cost=29770.81..30013.86
rows=97221 width=57)
Sort Key:
address_list.contact_info_id
-> Merge Join
(cost=17271.79..18731.55 rows=97221 width=57)
Merge Cond:
("outer".state_province_id = "inner".state_province_id)
-> Index Scan using
state_province_pkey on state_province (cost=0.00..3.81 rows=67 width=11)
-> Sort
(cost=17271.79..17514.84 rows=97221 width=46)
Sort Key:
address_list.state_province_id
-> Seq Scan on
address_list (cost=0.00..6882.52 rows=97221 width=46)
Filter:
(address_type_id = 101::numeric)

No difference. Note that all the keys that are used in the joins are
numeric(10)'s, so there shouldn't be any cast-issues.

When you create a primary key on a table, is an index created (I seem to
remember a message going by stating that an index would be added).

For comparison, our production Oracle database (running on nearly identical
hardware - the Postgres machine has IDE-RAID-5 and the Oracle machine has
RAID mirroring) takes between 1 and 2 seconds.

I've got one last question, and I really hope responses don't get
sidetracked by it; I see alot of negative comments towards MySQL, many of
them stating that it's a database layer overtop of the file system. Can
someone explain why Postgres is better than MySQL 4.0.14 using InnoDB?
MySQL, on the above query, with one less index (on address_list.country)
takes 0.20 seconds.

David.

In response to

Re: populate table with large csv file at 2003-09-26 11:58:16 from Dave [Hawk-Systems]

Responses

Re: Tuning/performance question. at 2003-09-28 10:09:56 from Shridhar Daithankar
Re: Tuning/performance question. at 2003-09-28 11:13:54 from Holger Marzen

Browse pgsql-general by date

	From	Date	Subject
Next Message	Ron Johnson	2003-09-28 04:06:50	Re: State of Beta 2
Previous Message	Dennis Gearon	2003-09-28 03:19:37	Re: State of Beta 2

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Shridhar Daithankar	2003-09-28 10:09:56	Re: Tuning/performance question.
Previous Message	Josh Berkus	2003-09-28 03:06:42	Re: Performance: BigInt vs Decimal(19,0)