Quick Links

Faster distinct query?

From:	Israel Brewster <ijbrewster(at)alaska(dot)edu>
To:	PostgreSQL Mailing Lists <pgsql-general(at)postgresql(dot)org>
Subject:	Faster distinct query?
Date:	2021-09-22 20:05:22
Message-ID:	0BB059BF-54BD-46FA-8B26-28FEF9A04198@alaska.edu
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

I was wondering if there was any way to improve the performance of this query:

SELECT station,array_agg(distinct(channel)) as channels FROM data GROUP BY station;

The explain execution plan can be found here:
https://explain.depesz.com/s/mtxB#html <https://explain.depesz.com/s/mtxB#html>

and it looks pretty straight forward. It does an index_only scan, followed by an aggregate, to produce a result that is a list of stations along with a list of channels associated with each (there can be anywhere from 1 to 3 channels associated with each station). This query takes around 5 minutes to run.

To work around the issue, I created a materialized view that I can update periodically, and of course I can query said view in no time flat. However, I’m concerned that as the dataset grows, the time it takes to refresh the view will also grow (correct me if I am wrong there).

This is running PostgreSQL 13, and the index referenced is a two-column index on data(station, channel)
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145

Responses

Re: Faster distinct query? at 2021-09-22 20:20:28 from David G. Johnston
Re: Faster distinct query? at 2021-09-22 20:20:46 from Michael Lewis
Re: Faster distinct query? at 2021-09-23 07:04:35 from hubert depesz lubaczewski
Re: Faster distinct query? at 2021-09-23 18:36:48 from Geoff Winkless

Browse pgsql-general by date

	From	Date	Subject
Next Message	David G. Johnston	2021-09-22 20:20:28	Re: Faster distinct query?
Previous Message	Tom Lane	2021-09-22 19:16:23	Re: Remove duplicated row in pg_largeobject_metadata