Re: Help understanding indexes, explain, and optimizing

From: Chris <dmagick(at)gmail(dot)com>
To: "i(dot)v(dot)r(dot)" <ivanvega(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Help understanding indexes, explain, and optimizing
Date: 2006-03-07 00:40:19
Message-ID: 440CD673.8080208@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

i.v.r. wrote:
> Hi everyone,
>
> I'm experimenting with PostgreSQL, but since I'm no expert DBA, I'm
> experiencing some performance issues.
>
> Please take a look at the following query:
>
> SELECT
> /*groups."name" AS t2_r1,
> groups."id" AS t2_r3,
> groups."user_id" AS t2_r0,
> groups."pretty_url" AS t2_r2,
> locations."postal_code" AS t0_r6,
> locations."pretty_url" AS t0_r7,
> locations."id" AS t0_r8,
> locations."colony_id" AS t0_r0,
> locations."user_id" AS t0_r1,
> locations."group_id" AS t0_r2,
> locations."distinction" AS t0_r3,
> locations."street" AS t0_r4,
> locations."street_2" AS t0_r5,
> schools."updated" AS t1_r10,
> schools."level_id" AS t1_r4,
> schools."pretty_url" AS t1_r11,
> schools."user_id" AS t1_r5,
> schools."id" AS t1_r12,
> schools."type_id" AS t1_r6,
> schools."distinction" AS t1_r7,
> schools."cct" AS t1_r8,
> schools."created_on" AS t1_r9,
> schools."location_id" AS t1_r0,
> schools."service_id" AS t1_r1,
> schools."sustentation_id" AS t1_r2,
> schools."dependency_id" AS t1_r3*/
> groups.*,
> locations.*,
> schools.*
> FROM locations
> LEFT OUTER JOIN groups ON groups.id = locations.group_id
> LEFT OUTER JOIN schools ON schools.location_id = locations.id
> WHERE (colony_id = 71501)
> ORDER BY groups.name, locations.distinction, schools.distinction
>
> As you can see, I've commented out some parts. I did that as an
> experiment, and it improved the query by 2x. I really don't understand
> how is that possible... I also tried changing the second join to an
> INNER join, and that improves it a little bit also.
>
> Anyway, the main culprit seems to be that second join. Here's the output
> from EXPLAIN:
>
> Sort (cost=94315.15..94318.02 rows=1149 width=852)
> Sort Key: groups.name, locations.distinction, schools.distinction
> -> Merge Left Join (cost=93091.96..94256.74 rows=1149 width=852)
> Merge Cond: ("outer".id = "inner".location_id)
> -> Sort (cost=4058.07..4060.94 rows=1148 width=646)
> Sort Key: locations.id
> -> Hash Left Join (cost=1.01..3999.72 rows=1148 width=646)
> Hash Cond: ("outer".group_id = "inner".id)
> -> Index Scan using locations_colony_id on
> locations (cost=0.00..3992.91 rows=1148 width=452)
> Index Cond: (colony_id = 71501)
> -> Hash (cost=1.01..1.01 rows=1 width=194)
> -> Seq Scan on groups (cost=0.00..1.01
> rows=1 width=194)
> -> Sort (cost=89033.90..89607.67 rows=229510 width=206)
> Sort Key: schools.location_id
> -> Seq Scan on schools (cost=0.00..5478.10 rows=229510
> width=206)
>
> I don't completely understand what that output means, but it would seem
> that the first join costs about 4000, but if I remove that join from the
> query, the performance difference is negligible. So as I said, it seems
> the problem is the join on the schools table.
>
> I hope it's ok for me to post the relevant tables here, so here they are
> (I removed some constraints and indexes that aren't relevant to the
> query above):
>
> CREATE TABLE groups
> (
> user_id int4 NOT NULL,
> name varchar(50) NOT NULL,
> pretty_url varchar(50) NOT NULL,
> id serial NOT NULL,
> CONSTRAINT groups_pk PRIMARY KEY (id),
> )
>
> CREATE TABLE locations
> (
> colony_id int4 NOT NULL,
> user_id int4 NOT NULL,
> group_id int4 NOT NULL,
> distinction varchar(60) NOT NULL,
> street varchar(60) NOT NULL,
> street_2 varchar(50) NOT NULL,
> postal_code varchar(5) NOT NULL,
> pretty_url varchar(60) NOT NULL,
> id serial NOT NULL,
> CONSTRAINT locations_pk PRIMARY KEY (id),
> CONSTRAINT colony FOREIGN KEY (colony_id)
> REFERENCES colonies (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION,
> CONSTRAINT "group" FOREIGN KEY (group_id)
> REFERENCES groups (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION,
> )
> CREATE INDEX locations_fki_colony
> ON locations
> USING btree
> (colony_id);
> CREATE INDEX locations_fki_group
> ON locations
> USING btree
> (group_id);
>
> CREATE TABLE schools
> (
> location_id int4 NOT NULL,
> service_id int4 NOT NULL,
> sustentation_id int4 NOT NULL,
> dependency_id int4 NOT NULL,
> level_id int4 NOT NULL,
> user_id int4 NOT NULL,
> type_id int4 NOT NULL,
> distinction varchar(25) NOT NULL,
> cct varchar(20) NOT NULL,
> created_on timestamp(0) NOT NULL,
> updated timestamp(0),
> pretty_url varchar(25) NOT NULL,
> id serial NOT NULL,
> CONSTRAINT schools_pk PRIMARY KEY (id),
> CONSTRAINT "location" FOREIGN KEY (location_id)
> REFERENCES locations (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION,
> )
> CREATE INDEX schools_fki_location
> ON schools
> USING btree
> (location_id);
>
> So I'm wondering what I'm doing wrong. I migrated this database from
> MySQL, and on there it ran pretty fast.

Have you done an 'analyze' or 'vacuum analyze' over these tables?

A left outer join gets *everything* from the second table:

> LEFT OUTER JOIN groups ON groups.id = locations.group_id
> LEFT OUTER JOIN schools ON schools.location_id = locations.id

So they will load everything from groups and schools. Maybe they should
be left join's not left outer joins?

--
Postgresql & php tutorials
http://www.designmagick.com/

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message i.v.r. 2006-03-07 02:11:47 Re: Help understanding indexes, explain, and optimizing
Previous Message Jim C. Nasby 2006-03-07 00:19:13 Re: Planner enhancement suggestion.