From: | boinger <boinger(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Performance woes relating to DISTINCT (I think) |
Date: | 2005-09-26 20:48:04 |
Message-ID: | 9e6d8b530509261348fe7d1de@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello.
I'm not sure if this is a question suited for here, the -sql list, or
the -performance list, so if I'm mis-posting, please direct me to the
right list.
I was unable to come up with anything relevant from the archives
(though...I am not too sure where to start so I may just have been
looking for the wrong things).
I am trying to convert a bunch of code from MySQL to Postgresql. In
MySQL I was running this particular query with more complexity (with
joins and such) and it takes far less than a second. The Pg
stripped-down version takes over 45 seconds (it was taking over 80
seconds with the joins).
The table-in-question (tasks_applied) contains 12 columns:
cid, modcode, yearcode, seid, tid, actid, pkgid, optional, corrected,
labor, lmid, parts_price
I have the following indexes:
CREATE INDEX actid ON tasks_applied USING btree (actid)
CREATE INDEX pkgid ON tasks_applied USING btree (pkgid)
CREATE INDEX strafe_group ON tasks_applied USING btree (modcode,
yearcode, seid, tid) WHERE cid = 0
CREATE UNIQUE INDEX tasks_applied_pkey ON tasks_applied USING btree
(cid, modcode, yearcode, seid, tid, actid, optional)
Here is my Pg query:
-----------------
SELECT DISTINCT
modcode,
yearcode,
seid,
COUNT(DISTINCT(tid)) AS task_count
FROM
tasks_applied
WHERE
cid=0 AND
seid=100001
GROUP BY
modcode,
yearcode,
seid
HAVING
COUNT(tid)>=0
ORDER BY
modcode ASC,
yearcode ASC,
seid ASC
---------------
Here's my EXPLAIN ANAYZE output:
-----------------
QUERY PLAN
Unique (cost=85168.84..85168.98 rows=11 width=22) (actual
time=45602.908..45607.399 rows=515 loops=1)
-> Sort (cost=85168.84..85168.87 rows=11 width=22) (actual
time=45602.897..45604.286 rows=515 loops=1)
Sort Key: modcode, yearcode, seid, count(DISTINCT tid)
-> GroupAggregate (cost=0.00..85168.65 rows=11 width=22)
(actual time=3149.916..45578.292 rows=515 loops=1)
Filter: (count(tid) >= 0)
-> Index Scan using strafe_group on tasks_applied
(cost=0.00..85167.23 rows=107 width=22) (actual
time=3144.908..45366.147 rows=29893 loops=1)
Filter: ((cid = 0) AND (seid = 100001))
Total runtime: 45609.207 ms
-------------
Finally, here's my MySQL query:
SELECT
tasks_applied.modcode AS modcode,
vin_models.shortname AS shortname,
vin_years.year AS year,
vin_years.yearcode AS yearcode,
service_events.details AS se,
service_events.intl_details AS i_se,
service_events.seid AS seid,
COUNT(DISTINCT(tid)) AS task_count
FROM
tasks_applied,
service_events,
vin_models,
vin_years
WHERE
cid=0
AND tasks_applied.yearcode=vin_years.yearcode
AND tasks_applied.modcode=vin_models.modcode
AND tasks_applied.seid=service_events.seid
AND tasks_applied.seid=100001
GROUP BY
se, modcode, year
HAVING
COUNT(tid)>=0
ORDER BY
tasks_applied.modcode ASC,
vin_years.year ASC,
service_events.seid ASC
---------
Any help would be greatly appreciated (even if it's just "RTFM on xxx").
Thanks
--jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Jim C. Nasby | 2005-09-26 21:00:33 | Re: How many insert + update should one transaction handle? |
Previous Message | Don Isgitt | 2005-09-26 19:44:42 | Re: Index use in BETWEEN statement... |