From: | hubert depesz lubaczewski <depesz(at)depesz(dot)com> |
---|---|
To: | Israel Brewster <ijbrewster(at)alaska(dot)edu> |
Cc: | PostgreSQL Mailing Lists <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Faster distinct query? |
Date: | 2021-09-23 07:04:35 |
Message-ID: | 20210923070435.GB8425@depesz.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Sep 22, 2021 at 12:05:22PM -0800, Israel Brewster wrote:
> I was wondering if there was any way to improve the performance of this query:
>
> SELECT station,array_agg(distinct(channel)) as channels FROM data GROUP BY station;
>
> The explain execution plan can be found here:
> https://explain.depesz.com/s/mtxB#html <https://explain.depesz.com/s/mtxB#html>
>
> and it looks pretty straight forward. It does an index_only scan, followed by an aggregate, to produce a result that is a list of stations along with a list of channels associated with each (there can be anywhere from 1 to 3 channels associated with each station). This query takes around 5 minutes to run.
>
> To work around the issue, I created a materialized view that I can update periodically, and of course I can query said view in no time flat. However, I’m concerned that as the dataset grows, the time it takes to refresh the view will also grow (correct me if I am wrong there).
>
> This is running PostgreSQL 13, and the index referenced is a two-column index on data(station, channel)
It looks that there is ~ 170 stations, and ~ 800 million rows int he
table.
can you tell us how many rows has this:
select distinct station, channel from data;
If this is not huge, then you can make the query run much faster using
skip scan - recursive cte.
Best regards,
depesz
From | Date | Subject | |
---|---|---|---|
Next Message | Tobias Meyer | 2021-09-23 10:49:54 | Re: Remove duplicated row in pg_largeobject_metadata |
Previous Message | Mladen Gogala | 2021-09-23 03:02:12 | Re: Faster distinct query? |