Quick Links

Re: SELECT DISTINCT too slow

From:	Miroslav Šulc <miroslav(dot)sulc(at)startnet(dot)cz>
To:
Cc:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: SELECT DISTINCT too slow
Date:	2006-06-01 16:06:53
Message-ID:	447F109D.5030400@startnet.cz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

Greg Stark napsal(a):
> Actually you could try the equivalent query:
>
> SELECT Key FROM MRTPContactValue GROUP BY Key
>
> This may or may not be faster because it can use a hash aggregate plan. I
> would expect it to be faster here because there are few distinct keys and the
> planner predicts that.
>
> Eventually these two queries should be handled the same by Postgres but Hash
> Aggregates are a new addition and DISTINCT/DISTINCT ON hasn't been adapted to
> make use of them.
>
> Also, incidentally, I don't see how a table of possible keys could help you
> here. Nothing forces they table MRTPContactValue to use all possible keys...
>

I simpified the case because it was slow by itself. GROUP BY really
makes this a lot faster.

The table contains properties for each contact that I cannot control how
many properties and what names of the properties there will be. In my
scenario user can export the data through user interface and I need to
know what keys are used there to create appropriate column names. There
is even one constraint. The contacts are grouped into groups so I need
to get only the keys from a selected group. The real query is this
(which is not so fast as the plain SELECT ... GROUP BY ... because the
other table is also large enough) but now it is faster than before:

SELECT Key FROM MRTPContactValue
INNER JOIN MRTPContact
ON MRTPContactValue.MRTPContactId = MRTPContact.Id
WHERE MRTPContact.MRTPWaveQuestionnaireId = 1
GROUP BY Key

Here's the query plan:

QUERY PLAN

HashAggregate (cost=32639.67..32639.76 rows=9 width=9) (actual time=19407.116..19407.146 rows=10 loops=1)

-> Hash Join (cost=8070.36..31598.51 rows=416466 width=9) (actual time=5917.367..17607.502 rows=416466 loops=1)

Hash Cond: ("outer".mrtpcontactid = "inner".id)

-> Seq Scan on mrtpcontactvalue (cost=0.00..8669.66 rows=416466 width=17) (actual time=9.094..4233.131 rows=416466 loops=1)

-> Hash (cost=7119.80..7119.80 rows=137824 width=8) (actual time=5096.750..5096.750 rows=137824 loops=1)

-> Seq Scan on mrtpcontact (cost=0.00..7119.80 rows=137824 width=8) (actual time=9.312..4337.647 rows=137824 loops=1)

Filter: (mrtpwavequestionnaireid = 1)

Total runtime: 19417.873 ms

The same query using DISTINCT takes about 40 sec to complete.

Thank you.

--
Miroslav Šulc

Attachment	Content-Type	Size
miroslav.sulc.vcf	text/x-vcard	349 bytes

In response to

Re: SELECT DISTINCT too slow at 2006-06-01 15:46:28 from Greg Stark

Browse pgsql-sql by date

	From	Date	Subject
Next Message	postgres	2006-06-01 16:44:50	Re: Table design question
Previous Message	Greg Stark	2006-06-01 15:46:28	Re: SELECT DISTINCT too slow