From: | Trevor Campbell <tcampbell(at)atlassian(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Trouble with plan statistics for behaviour for query. |
Date: | 2012-05-31 22:29:47 |
Message-ID: | 4FC7F0DB.40600@atlassian.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-performance |
We are having trouble with a particular query being slow in a strange manner.
The query is a join over two large tables that are suitably indexed.
select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE,
CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=? order by
CG.CREATED asc, CI.ID asc
For some tasks we run this particular query a very large number of times and it has a significant performance impact
when it runs slowly.
If we run ANALYSE over the CHANGEITEM table then the performance picks up by a factor of 5 or more. The problem is that
a day later the performance will have dropped back to its previously slow state.
The reason this is so hard to understand is that the activity on this table is very low, with no updates and only a
relatively small number of inserts each day, < 0.1% of the table size.
Explain output:
Sort (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087 rows=14 loops=1)
Sort Key: cg.created, ci.id
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.00..86.71 rows=11 width=118) (actual time=0.022..0.061 rows=14 loops=1)
-> Index Scan using chggroup_issue on changegroup cg (cost=0.00..17.91 rows=8 width=33) (actual
time=0.012..0.015 rows=7 loops=1)
Index Cond: (issueid = 81001::numeric)
-> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58 rows=2 width=91) (actual
time=0.005..0.005 rows=2 loops=7)
Index Cond: (groupid = cg.id)
Total runtime: 0.116 ms
The explain output always seems the same even when the performance is poor, but I can't be sure of that.
Overall it seems like PostgreSQL just forgets about the statistics it has gathered after a short while.
Schema details:
CREATE TABLE changegroup
(
id numeric(18,0) NOT NULL,
issueid numeric(18,0),
author character varying(255),
created timestamp with time zone,
CONSTRAINT pk_changegroup PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX chggroup_issue
ON changegroup
USING btree
(issueid );
CREATE TABLE changeitem
(
id numeric(18,0) NOT NULL,
groupid numeric(18,0),
fieldtype character varying(255),
field character varying(255),
oldvalue text,
oldstring text,
newvalue text,
newstring text,
CONSTRAINT pk_changeitem PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX chgitem_chggrp
ON changeitem
USING btree
(groupid );
CREATE INDEX chgitem_field
ON changeitem
USING btree
(field COLLATE pg_catalog."default" );
Table sizes
changegroup : 2,000,000 rows
changeitem : 2,500,000 rows
The changegroup table has on average about 4 rows per issueid value, which is the query parameter.
We run autovacuum and autoanalyse, but as the activity in the table is low these are rarely if ever invoked on these tables.
Environment.
Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu, although this is a problem across a variety of postgres
versions.
From | Date | Subject | |
---|---|---|---|
Next Message | Craig James | 2012-05-31 22:55:21 | Re: Trouble with plan statistics for behaviour for query. |
Previous Message | John R Pierce | 2012-05-31 21:53:08 | Re: PostgreSQL 9.0.4 crashing |
From | Date | Subject | |
---|---|---|---|
Next Message | Craig James | 2012-05-31 22:55:21 | Re: Trouble with plan statistics for behaviour for query. |
Previous Message | Claudio Freire | 2012-05-31 15:33:52 | Re: [HACKERS] pg_dump and thousands of schemas |