Re: SELECT DISTINCT too slow

From: Miroslav Šulc <miroslav(dot)sulc(at)startnet(dot)cz>
To:
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT DISTINCT too slow
Date: 2006-06-01 16:06:53
Message-ID: 447F109D.5030400@startnet.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Greg Stark napsal(a):
> Actually you could try the equivalent query:
>
> SELECT Key FROM MRTPContactValue GROUP BY Key
>
> This may or may not be faster because it can use a hash aggregate plan. I
> would expect it to be faster here because there are few distinct keys and the
> planner predicts that.
>
> Eventually these two queries should be handled the same by Postgres but Hash
> Aggregates are a new addition and DISTINCT/DISTINCT ON hasn't been adapted to
> make use of them.
>
> Also, incidentally, I don't see how a table of possible keys could help you
> here. Nothing forces they table MRTPContactValue to use all possible keys...
>

I simpified the case because it was slow by itself. GROUP BY really
makes this a lot faster.

The table contains properties for each contact that I cannot control how
many properties and what names of the properties there will be. In my
scenario user can export the data through user interface and I need to
know what keys are used there to create appropriate column names. There
is even one constraint. The contacts are grouped into groups so I need
to get only the keys from a selected group. The real query is this
(which is not so fast as the plain SELECT ... GROUP BY ... because the
other table is also large enough) but now it is faster than before:

SELECT Key FROM MRTPContactValue
INNER JOIN MRTPContact
ON MRTPContactValue.MRTPContactId = MRTPContact.Id
WHERE MRTPContact.MRTPWaveQuestionnaireId = 1
GROUP BY Key

Here's the query plan:

QUERY PLAN

HashAggregate (cost=32639.67..32639.76 rows=9 width=9) (actual time=19407.116..19407.146 rows=10 loops=1)

-> Hash Join (cost=8070.36..31598.51 rows=416466 width=9) (actual time=5917.367..17607.502 rows=416466 loops=1)

Hash Cond: ("outer".mrtpcontactid = "inner".id)

-> Seq Scan on mrtpcontactvalue (cost=0.00..8669.66 rows=416466 width=17) (actual time=9.094..4233.131 rows=416466 loops=1)

-> Hash (cost=7119.80..7119.80 rows=137824 width=8) (actual time=5096.750..5096.750 rows=137824 loops=1)

-> Seq Scan on mrtpcontact (cost=0.00..7119.80 rows=137824 width=8) (actual time=9.312..4337.647 rows=137824 loops=1)

Filter: (mrtpwavequestionnaireid = 1)

Total runtime: 19417.873 ms

The same query using DISTINCT takes about 40 sec to complete.

Thank you.

--
Miroslav Šulc

Attachment Content-Type Size
miroslav.sulc.vcf text/x-vcard 349 bytes

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message postgres 2006-06-01 16:44:50 Re: Table design question
Previous Message Greg Stark 2006-06-01 15:46:28 Re: SELECT DISTINCT too slow