From: | Keaton Adams <kadams(at)mxlogic(dot)com> |
---|---|
To: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Query running slow but was running fine before |
Date: | 2008-07-07 22:06:41 |
Message-ID: | C497EF91.4A0A%kadams@mxlogic.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
PG: PostgreSQL 8.1.4
OS: RHEL 4.x
I have a set of queries on a production server that have been running fine for the past few months but as of last Friday started performing poorly. I have isolated the problem down to a particular part that is common to all queries involved and have provided an example here of what the issue is. There are multiple tables where this is a problem, not just one.
An analyze is run on the tables every day (even several times a day because they are updated very frequently) and a vacuum analyze is run on the weekends. I also tried to run an analyze specifically on the customer_id column and then the product_id column but that didn't help.
This one table listed is part of a UNION ALL that joins ten separate tables. The "real" query is against the view name of kda_log_info. I am working with just one of the tables within the view to help narrow down where the problem might be.
There may be an occasion when the product_id value is null, so this is how the query is written. It uses the index on customer_id and does a filter on product_id. The cost is high and the time it takes to scan the 964 rows is about one minute on average (where before it would return in sub-second time):
kadams(at)hostserver> more t4.sql
select count(*)
from kda_log_info_2008w24
where customer_id = 7767664
AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone)
AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone)
and ((product_id IS NULL) OR (product_id = 2070101833));
kadams(at)hostserver> psql -Upostgres -dkda_log -ft4.sql
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=426537.34..426537.35 rows=1 width=0)
-> Index Scan using kda_log_info_cid_cre_dom_2008w24_idx on kda_log_info_2008w24 (cost=0.00..426535.75 rows=635 width=0)
Index Cond: ((customer_id = 7767664) AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone) AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone))
Filter: ((product_id IS NULL) OR (product_id = 2070101833))
(4 rows)
kadams(at)hostserver> time psql -Upostgres -dkda_log -ft4.sql
count
-------
964
(1 row)
real 0m54.810s
user 0m0.002s
sys 0m0.001s
kda_log=# select count(*) from kda_log_info_2008w24;
count
----------
16356303
(1 row)
Index:
kda_log_info_cid_cre_dom_2008w24_idx btree (customer_id, created, "domain")
On the same dataset, if I eliminate the IS NULL and look for just the specific product_id the optimizer picks the index on the product_id column and filters on customer_id and scans the same 964 rows in 2 milliseconds:
select count(*)
from kda_log_info_2008w24
where customer_id = 7767664
AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone)
AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone)
--and ((product_id IS NULL) OR (product_id = 2070101833))
and product_id = 2070101833;
kadams(at)hostserver> time psql -Upostgres -dkda_log -ft4.sql
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1649.41..1649.42 rows=1 width=0)
-> Index Scan using kda_log_info_did_cre_2008w24_idx on kda_log_info_2008w24 (cost=0.00..1647.82 rows=635 width=0)
Index Cond: ((product_id = 2070101833) AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone) AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone))
Filter: (customer_id = 7767664)
(4 rows)
kadams(at)hostserver> time psql -Upostgres -dkda_log -ft4.sql
count
-------
964
(1 row)
real 0m0.207s
user 0m0.004s
sys 0m0.002s
There are currently no rows in the table where product_id is NULL:
mxl_log=# select count(*) from kda_log_info_2008w24 where product_id IS NULL;
-------
0
(1 row)
Index:
kda_log_info_did_cre_2008w24_idx" btree (domain_id, created)
As another comparison I ran this same query on a different database server with the same database layout (different data set, of course) which returns the same approximate number of rows for the given customer_id / product_id and it returns in sub-second time. This table has even more rows in it than on the server where performance has tanked:
kadams(at)hostserver2> more t2.sql
select count(*)
from kda_log_info_2008w24
where customer_id = 907
AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone)
AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone)
and ((product_id IS NULL) OR (product_id = 573351));
kadams(at)hostserver2> psql -Upostgres -dkda_log -ft2.sql
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2626.36..2626.37 rows=1 width=0)
-> Index Scan using kda_log_info_cid_cre_dom_2008w24_idx on kda_log_info_2008w24 (cost=0.00..2626.35 rows=1 width=0)
Index Cond: ((customer_id = 907) AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone) AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone))
Filter: ((product_id IS NULL) OR (product_id = 573351))
(4 rows)
kadams(at)hostserver2> time psql -Upostgres -dmxl_log -ft2.sql
count
-------
992
(1 row)
real 0m0.011s
user 0m0.002s
sys 0m0.002s
kda_log=# select count(*) from kda_log_info_2008w24;
count
----------
21777364
(1 row)
I compared postgresql.conf file settings between the servers and they are identical.
The physical servers are also the same configuration (8 CPU, 8 GB RAM, local fast-SCSI disk).
So, I have no idea why the query in the top of this email on this particular database server has gone from sub-second response to over a minute on average. As only part of the overall query, times have gone from a few seconds to 15-20 minutes each, which is causing major problems for our users.
Any idea what else I can look at or do to resolve this problem? Any additional information I can provide to help you guys figure this out?
PS: Yes, I know..... 8.1 is old..... We are migrating to 8.3.x in the fall.
Thanks for your help,
Keaton
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Marlowe | 2008-07-07 22:56:23 | Re: [pg_gen] what was the "issue" with the 8.3.2 bundle ? |
Previous Message | Christophe | 2008-07-07 20:39:01 | Re: To store and retrive image data in postgresql |