From: | Israel Brewster <ijbrewster(at)alaska(dot)edu> |
---|---|
To: | Geoff Winkless <pgsqladmin(at)geoff(dot)dj> |
Cc: | PostgreSQL Mailing Lists <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Faster distinct query? |
Date: | 2021-09-23 20:17:31 |
Message-ID: | 3272D898-610C-45F1-B227-EB47510E4B0B@alaska.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> On Sep 23, 2021, at 10:36 AM, Geoff Winkless <pgsqladmin(at)geoff(dot)dj> wrote:
>
> On Wed, 22 Sept 2021 at 21:05, Israel Brewster <ijbrewster(at)alaska(dot)edu <mailto:ijbrewster(at)alaska(dot)edu>> wrote:
> I was wondering if there was any way to improve the performance of this query:
>
> SELECT station,array_agg(distinct(channel)) as channels FROM data GROUP BY station;
>
> If you have tables of possible stations and channels (and if not, why not?), then an EXISTS query, something like
>
> SELECT stations.name <http://stations.name/>, ARRAY_AGG(channels.name <http://channels.name/>)
> FROM stations, channels
> WHERE EXISTS
> (SELECT FROM data WHERE data.channels=channels.name <http://channels.name/> AND data.station=stations.name <http://stations.name/>)
> GROUP BY stations.name <http://stations.name/>
>
> will usually be much faster, because it can stop scanning after the first match in the index.
So that one ran in about 5 minutes as well - apparently the time it takes to scan the index, given the similarity of run times for each of the different queries: https://explain.depesz.com/s/w46h <https://explain.depesz.com/s/w46h>
It’s making me think though, because this similar (though incomplete, and therefore incorrect result) query runs in only 19ms (https://explain.depesz.com/s/iZnN <https://explain.depesz.com/s/iZnN>):
SELECT
stations.name,
array_agg(channels.channel)
FROM stations,channels
WHERE EXISTS (SELECT
FROM data
WHERE data.station=stations.id)
GROUP BY stations.name
It’s only when I add in the AND data.channels=channels.channel that the query time blows up to 5+ minutes. I personally don’t understand why there would be such a large difference between the two queries - something wrong with my indexes?
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145
>
> Geoff
From | Date | Subject | |
---|---|---|---|
Next Message | Ron | 2021-09-23 20:37:51 | Re: Postgres incremental backups per db (not per cluster) |
Previous Message | Jaime Solorzano | 2021-09-23 19:57:27 | Postgres incremental backups per db (not per cluster) |