Help understanding indexes, explain, and optimizing a query

From: "i(dot)v(dot)r(dot)" <ivanvega(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Help understanding indexes, explain, and optimizing a query
Date: 2006-03-07 00:15:55
Message-ID: 440CD0BB.2020800@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi everyone,

I'm experimenting with PostgreSQL, but since I'm no expert DBA, I'm
experiencing some performance issues.

Please take a look at the following query:

SELECT
/*groups."name" AS t2_r1,
groups."id" AS t2_r3,
groups."user_id" AS t2_r0,
groups."pretty_url" AS t2_r2,
locations."postal_code" AS t0_r6,
locations."pretty_url" AS t0_r7,
locations."id" AS t0_r8,
locations."colony_id" AS t0_r0,
locations."user_id" AS t0_r1,
locations."group_id" AS t0_r2,
locations."distinction" AS t0_r3,
locations."street" AS t0_r4,
locations."street_2" AS t0_r5,
schools."updated" AS t1_r10,
schools."level_id" AS t1_r4,
schools."pretty_url" AS t1_r11,
schools."user_id" AS t1_r5,
schools."id" AS t1_r12,
schools."type_id" AS t1_r6,
schools."distinction" AS t1_r7,
schools."cct" AS t1_r8,
schools."created_on" AS t1_r9,
schools."location_id" AS t1_r0,
schools."service_id" AS t1_r1,
schools."sustentation_id" AS t1_r2,
schools."dependency_id" AS t1_r3*/
groups.*,
locations.*,
schools.*
FROM locations
LEFT OUTER JOIN groups ON groups.id = locations.group_id
LEFT OUTER JOIN schools ON schools.location_id = locations.id
WHERE (colony_id = 71501)
ORDER BY groups.name, locations.distinction, schools.distinction

As you can see, I've commented out some parts. I did that as an
experiment, and it improved the query by 2x. I really don't understand
how is that possible... I also tried changing the second join to an
INNER join, and that improves it a little bit also.

Anyway, the main culprit seems to be that second join. Here's the output
from EXPLAIN:

Sort (cost=94315.15..94318.02 rows=1149 width=852)
Sort Key: groups.name, locations.distinction, schools.distinction
-> Merge Left Join (cost=93091.96..94256.74 rows=1149 width=852)
Merge Cond: ("outer".id = "inner".location_id)
-> Sort (cost=4058.07..4060.94 rows=1148 width=646)
Sort Key: locations.id
-> Hash Left Join (cost=1.01..3999.72 rows=1148 width=646)
Hash Cond: ("outer".group_id = "inner".id)
-> Index Scan using locations_colony_id on
locations (cost=0.00..3992.91 rows=1148 width=452)
Index Cond: (colony_id = 71501)
-> Hash (cost=1.01..1.01 rows=1 width=194)
-> Seq Scan on groups (cost=0.00..1.01
rows=1 width=194)
-> Sort (cost=89033.90..89607.67 rows=229510 width=206)
Sort Key: schools.location_id
-> Seq Scan on schools (cost=0.00..5478.10 rows=229510
width=206)

I don't completely understand what that output means, but it would seem
that the first join costs about 4000, but if I remove that join from the
query, the performance difference is negligible. So as I said, it seems
the problem is the join on the schools table.

I hope it's ok for me to post the relevant tables here, so here they are
(I removed some constraints and indexes that aren't relevant to the
query above):

CREATE TABLE groups
(
user_id int4 NOT NULL,
name varchar(50) NOT NULL,
pretty_url varchar(50) NOT NULL,
id serial NOT NULL,
CONSTRAINT groups_pk PRIMARY KEY (id),
)

CREATE TABLE locations
(
colony_id int4 NOT NULL,
user_id int4 NOT NULL,
group_id int4 NOT NULL,
distinction varchar(60) NOT NULL,
street varchar(60) NOT NULL,
street_2 varchar(50) NOT NULL,
postal_code varchar(5) NOT NULL,
pretty_url varchar(60) NOT NULL,
id serial NOT NULL,
CONSTRAINT locations_pk PRIMARY KEY (id),
CONSTRAINT colony FOREIGN KEY (colony_id)
REFERENCES colonies (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT "group" FOREIGN KEY (group_id)
REFERENCES groups (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
)
CREATE INDEX locations_fki_colony
ON locations
USING btree
(colony_id);
CREATE INDEX locations_fki_group
ON locations
USING btree
(group_id);

CREATE TABLE schools
(
location_id int4 NOT NULL,
service_id int4 NOT NULL,
sustentation_id int4 NOT NULL,
dependency_id int4 NOT NULL,
level_id int4 NOT NULL,
user_id int4 NOT NULL,
type_id int4 NOT NULL,
distinction varchar(25) NOT NULL,
cct varchar(20) NOT NULL,
created_on timestamp(0) NOT NULL,
updated timestamp(0),
pretty_url varchar(25) NOT NULL,
id serial NOT NULL,
CONSTRAINT schools_pk PRIMARY KEY (id),
CONSTRAINT "location" FOREIGN KEY (location_id)
REFERENCES locations (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
)
CREATE INDEX schools_fki_location
ON schools
USING btree
(location_id);

So I'm wondering what I'm doing wrong. I migrated this database from
MySQL, and on there it ran pretty fast.

Kind regards,
Ivan V.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jim C. Nasby 2006-03-07 00:19:13 Re: Planner enhancement suggestion.
Previous Message Jignesh K. Shah 2006-03-06 22:11:29 Re: t1000/t2000 sun-servers