From: | Bart Grantham <bart(at)logicworks(dot)net> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Odd performance issue |
Date: | 2005-06-20 19:27:41 |
Message-ID: | 42B718AD.50102@logicworks.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Summary: depending on the value, the planner will sometimes choose a seq
scan, sometimes an index scan. The former produces terrible performace,
the latter great performance.
The long story: we had a disk failure (NOT the disk the db was on) and
the machine's system disk had to be rebuilt from the raid array and
re-GRUB'ed. Now that the the system is back up we are seeing terrible
performance (or more accurately, wildly varying performance). I've
tried re-importing the data from the live system (this is new hardware
under testing for the system) and re-initing the db cluster. A specific
example is probably best. This 'connections' table has about 922K
rows. The difference here is node_id's 28542 vs. 28560. Using 28542
causes an index scan, 28560 causes a seq scan:
The details:
logicops2=> explain SELECT * from connections AS c LEFT JOIN
connection_types AS ct ON ct.connection_type_id = c.connection_type_id
WHERE ( connector_node_id = 28542 OR connectee_node_id = 28542 ) AND (
c.connection_type_id < 1000 ) LIMIT 300;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.29..563.05 rows=203 width=116)
-> Hash Left Join (cost=1.29..563.05 rows=203 width=116)
Hash Cond: ("outer".connection_type_id =
"inner".connection_type_id)
-> Index Scan using c_connector_node_id, c_connectee_node_id
on connections c (cost=0.00..558.72 rows=203 width=33)
Index Cond: ((connector_node_id = 28542) OR
(connectee_node_id = 28542))
Filter: (connection_type_id < 1000)
-> Hash (cost=1.23..1.23 rows=23 width=83)
-> Seq Scan on connection_types ct (cost=0.00..1.23
rows=23 width=83)
(8 rows)
Time: 0.935 ms
logicops2=> SELECT * from connections AS c LEFT JOIN connection_types AS
ct ON ct.connection_type_id = c.connection_type_id WHERE (
connector_node_id = 28542 OR connectee_node_id = 28542 ) AND (
c.connection_type_id < 1000 ) LIMIT 300;
...results...
(12 rows)
Time: 1.887 ms
-vs-
logicops2=> explain SELECT * from connections AS c LEFT JOIN
connection_types AS ct ON ct.connection_type_id = c.connection_type_id
WHERE ( connector_node_id = 28560 OR connectee_node_id = 28560 ) AND (
c.connection_type_id < 1000 ) LIMIT 300;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Limit (cost=1.29..686.09 rows=300 width=116)
-> Hash Left Join (cost=1.29..24939.39 rows=10925 width=116)
Hash Cond: ("outer".connection_type_id =
"inner".connection_type_id)
-> Seq Scan on connections c (cost=0.00..24774.23 rows=10925
width=33)
Filter: (((connector_node_id = 28560) OR
(connectee_node_id = 28560)) AND (connection_type_id < 1000))
-> Hash (cost=1.23..1.23 rows=23 width=83)
-> Seq Scan on connection_types ct (cost=0.00..1.23
rows=23 width=83)
(7 rows)
Time: 0.704 ms
logicops2=> SELECT * from connections AS c LEFT JOIN connection_types AS
ct ON ct.connection_type_id = c.connection_type_id WHERE (
connector_node_id = 28560 OR connectee_node_id = 28560 ) AND (
c.connection_type_id < 1000 ) LIMIT 300;
...results...
(7 rows)
Time: 578.597 ms
... it may be relevant that one node_id has 15 times as many connections:
logicops2=> select count(*) from connections where connector_node_id =
28542 OR connectee_node_id = 28542;
count
-------
856
(1 row)
Time: 1.424 ms
logicops2=> select count(*) from connections where connector_node_id =
28560 OR connectee_node_id = 28560;
count
-------
13500
(1 row)
Time: 559.696 ms
... but that shouldn't make a difference to the planner, should it?
Yes, I've vacuum analyzed.
Also, I was wondering if someone could correct me on a bit of array
syntax. I'd like to have a query pass back an array of ints to a
function call. Something like this:
logicops2=> select * from nodes2ancestors(array[(select node_id from
nodes where node_type_id = 3)]::int[], 0);
ERROR: more than one row returned by a subquery used as an expression
Thanks for any help/pointers you guys can provide. I really appreciate
it as I'm down to the wire on a project and this performance thing has
really blindsided us.
Bart
From | Date | Subject | |
---|---|---|---|
Next Message | Rohit Gaddi | 2005-06-20 19:48:33 | index selection by query planner |
Previous Message | O.B. | 2005-06-20 19:27:06 | Re: Access is denied during initdb |