Quick Links

Performance woes relating to DISTINCT (I think)

From:	boinger <boinger(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Performance woes relating to DISTINCT (I think)
Date:	2005-09-26 20:48:04
Message-ID:	9e6d8b530509261348fe7d1de@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hello.

I'm not sure if this is a question suited for here, the -sql list, or
the -performance list, so if I'm mis-posting, please direct me to the
right list.

I was unable to come up with anything relevant from the archives
(though...I am not too sure where to start so I may just have been
looking for the wrong things).

I am trying to convert a bunch of code from MySQL to Postgresql. In
MySQL I was running this particular query with more complexity (with
joins and such) and it takes far less than a second. The Pg
stripped-down version takes over 45 seconds (it was taking over 80
seconds with the joins).

The table-in-question (tasks_applied) contains 12 columns:
cid, modcode, yearcode, seid, tid, actid, pkgid, optional, corrected,
labor, lmid, parts_price

I have the following indexes:
CREATE INDEX actid ON tasks_applied USING btree (actid)
CREATE INDEX pkgid ON tasks_applied USING btree (pkgid)
CREATE INDEX strafe_group ON tasks_applied USING btree (modcode,
yearcode, seid, tid) WHERE cid = 0
CREATE UNIQUE INDEX tasks_applied_pkey ON tasks_applied USING btree
(cid, modcode, yearcode, seid, tid, actid, optional)

Here is my Pg query:
-----------------
SELECT DISTINCT
modcode,
yearcode,
seid,
COUNT(DISTINCT(tid)) AS task_count
FROM
tasks_applied
WHERE
cid=0 AND
seid=100001
GROUP BY
modcode,
yearcode,
seid
HAVING
COUNT(tid)>=0
ORDER BY
modcode ASC,
yearcode ASC,
seid ASC
---------------

Here's my EXPLAIN ANAYZE output:
-----------------
QUERY PLAN

Unique (cost=85168.84..85168.98 rows=11 width=22) (actual
time=45602.908..45607.399 rows=515 loops=1)

-> Sort (cost=85168.84..85168.87 rows=11 width=22) (actual
time=45602.897..45604.286 rows=515 loops=1)

Sort Key: modcode, yearcode, seid, count(DISTINCT tid)

-> GroupAggregate (cost=0.00..85168.65 rows=11 width=22)
(actual time=3149.916..45578.292 rows=515 loops=1)

Filter: (count(tid) >= 0)

-> Index Scan using strafe_group on tasks_applied
(cost=0.00..85167.23 rows=107 width=22) (actual
time=3144.908..45366.147 rows=29893 loops=1)

Filter: ((cid = 0) AND (seid = 100001))

Total runtime: 45609.207 ms

-------------

Finally, here's my MySQL query:
SELECT
tasks_applied.modcode AS modcode,
vin_models.shortname AS shortname,
vin_years.year AS year,
vin_years.yearcode AS yearcode,
service_events.details AS se,
service_events.intl_details AS i_se,
service_events.seid AS seid,
COUNT(DISTINCT(tid)) AS task_count
FROM
tasks_applied,
service_events,
vin_models,
vin_years
WHERE
cid=0
AND tasks_applied.yearcode=vin_years.yearcode
AND tasks_applied.modcode=vin_models.modcode
AND tasks_applied.seid=service_events.seid
AND tasks_applied.seid=100001
GROUP BY
se, modcode, year
HAVING
COUNT(tid)>=0
ORDER BY
tasks_applied.modcode ASC,
vin_years.year ASC,
service_events.seid ASC
---------

Any help would be greatly appreciated (even if it's just "RTFM on xxx").

Thanks

--jeff

Responses

Re: Performance woes relating to DISTINCT (I think) at 2005-09-27 12:08:33 from Dawid Kuroczko

Browse pgsql-general by date

	From	Date	Subject
Next Message	Jim C. Nasby	2005-09-26 21:00:33	Re: How many insert + update should one transaction handle?
Previous Message	Don Isgitt	2005-09-26 19:44:42	Re: Index use in BETWEEN statement...