abnormally long time in performing a two-table join

From: Chris Mungall <cjm(at)fruitfly(dot)org>
To: <pgsql-admin(at)postgresql(dot)org>
Subject: abnormally long time in performing a two-table join
Date: 2002-08-11 21:18:02
Message-ID: Pine.LNX.4.33.0208111408010.16003-100000@sos.lbl.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

I've recently made the switch from mysql to postgres - i am now a
zealous postgres convert. however, i'm experiencing some performance
issues that i'm sure are just a reflection on my inexperience admining pg.

A simple two-table natural join (on integer primary/foreign keys) seems
abnormally slow.

(I have performed VACUUM ANALYZE on both tables)

both forced index scans and the default seq+index nested loop
combination are of comparably slow speed.

I include explain analyze output for both modes, plus details on the
tables and their indexes:

omicia29=# set enable_seqscan=off;
SET VARIABLE

omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id = 15;
NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..197011.39 rows=223 width=8) (actual time=16744.92..44572.00 rows=15 loops=1)
-> Index Scan using seqfeature_pkey on seqfeature (cost=0.00..61715.62 rows=44674 width=4) (actual time=0.29..14669.06 rows=100030 loops=1)
-> Index Scan using sfqv_idx1 on sfqv (cost=0.00..3.02 rows=1 width=4) (actual time=0.29..0.29 rows=0 loops=100030)

Total runtime: 44572.13 msec

omicia29=# set enable_seqscan=on;
SET VARIABLE

omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id = 15;
NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..186352.20 rows=223 width=8) (actual time=7803.68..14224.36 rows=15 loops=1)
-> Seq Scan on seqfeature (cost=0.00..51056.43 rows=44674 width=4) (actual time=0.13..4050.09 rows=100030 loops=1)
-> Index Scan using sfqv_idx1 on sfqv (cost=0.00..3.02 rows=1 width=4) (actual time=0.10..0.10 rows=0 loops=100030)
Total runtime: 14224.56 msec

omicia29=# select relkind,sum(relpages) from pg_class group by relkind;
relkind | sum
---------+--------
S | 26
i | 678149
r | 487314
s | 0
t | 162194
v | 0
(6 rows)

omicia29=# select count(*) from seqfeature;
count
---------
2632194
(1 row)

omicia29=# select count(*) from sfqv;
count
---------
8284617
(1 row)

omicia29=# select count(*) from seqfeature where seqfeature_key_id = 15;
count
--------
100030
(1 row)

omicia29=# select count(*) from sfqv where qualifier_value = 'BRCA1';
count
-------
110
(1 row)

I am obviously doing something wrong, as other joins between the
"seqfeature" table and other tables of comparable size to "sfqv" are
nice and fast.

I can make this go much faster by doing two seperate queries, and
plugging the foreign keys from "sfqv" into a query on "seqfeature";
but obviously I'd rather do this in one SQL query than two.

the pg planner seems to be attempting to do the query the other way
round - doing the query that returns the most rows first??

I have tried forcing the order with a CROSS JOIN and explicitly
constraining by the seqfeature_id field, it works out the same

these are the indexes:

CREATE INDEX sfqv_idx1 ON sfqv USING btree (seqfeature_id);
CREATE INDEX sfqv_idx2 ON sfqv USING btree (ontology_term_id);
CREATE INDEX sfqv_idx3 ON sfqv USING btree (qualifier_value);
CREATE INDEX sfqv_idx4 ON sfqv USING btree (seqfeature_id, qualifier_value);

CREATE INDEX sf1 ON seqfeature USING btree (seqfeature_key_id);
CREATE INDEX sf2 ON seqfeature USING btree (seqfeature_source_id);
CREATE INDEX sf3 ON seqfeature USING btree (bioentry_id);
CREATE INDEX sf4 ON seqfeature USING btree (seqfeature_id, seqfeature_key_id);

as you can see from the EXPLAIN above, only sfqv_idx is being used.

I would have thought sfqv_idx4 would be useful in this particular query?

and also sf4?

and finally, here are the actual tables:

CREATE TABLE "seqfeature" (
"seqfeature_id" integer DEFAULT nextval('seqfeature_pk_seq'::text) NOT N
ULL,
"bioentry_id" integer NOT NULL,
"seqfeature_key_id" integer,
"seqfeature_source_id" integer,
"seqfeature_rank" integer,
Constraint "seqfeature_pkey" Primary Key ("seqfeature_id")
);

CREATE TABLE "sfqv" (
"seqfeature_id" integer,
"ontology_term_id" integer,
"qualifier_rank" integer,
"qualifier_value" character(255)
);

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2002-08-11 21:41:35 Re: abnormally long time in performing a two-table join
Previous Message Brian McCane 2002-08-11 13:04:50 Re: Postgres performance slowly gets worse over a month