Re: Faster distinct query?

From: Ryan Booz <ryan(at)timescale(dot)com>
To: depesz(at)depesz(dot)com
Cc: Israel Brewster <ijbrewster(at)alaska(dot)edu>, PostgreSQL Mailing Lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: Faster distinct query?
Date: 2021-09-23 12:34:54
Message-ID: CADyMnEzz2QQJu1J9tDCO36HSrsGCpa1o3Lg1wDQCW_i-ShRt9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Heh, I honestly forgot about the recursive CTE. Certainly worth a try and
wouldn't require installing other extensions.

This is what depesz is referring to:
https://wiki.postgresql.org/wiki/Loose_indexscan

On Thu, Sep 23, 2021 at 3:04 AM hubert depesz lubaczewski <depesz(at)depesz(dot)com>
wrote:

> On Wed, Sep 22, 2021 at 12:05:22PM -0800, Israel Brewster wrote:
> > I was wondering if there was any way to improve the performance of this
> query:
> >
> > SELECT station,array_agg(distinct(channel)) as channels FROM data GROUP
> BY station;
> >
> > The explain execution plan can be found here:
> > https://explain.depesz.com/s/mtxB#html <
> https://explain.depesz.com/s/mtxB#html>
> >
> > and it looks pretty straight forward. It does an index_only scan,
> followed by an aggregate, to produce a result that is a list of stations
> along with a list of channels associated with each (there can be anywhere
> from 1 to 3 channels associated with each station). This query takes around
> 5 minutes to run.
> >
> > To work around the issue, I created a materialized view that I can
> update periodically, and of course I can query said view in no time flat.
> However, I’m concerned that as the dataset grows, the time it takes to
> refresh the view will also grow (correct me if I am wrong there).
> >
> > This is running PostgreSQL 13, and the index referenced is a two-column
> index on data(station, channel)
>
> It looks that there is ~ 170 stations, and ~ 800 million rows int he
> table.
>
> can you tell us how many rows has this:
>
> select distinct station, channel from data;
>
> If this is not huge, then you can make the query run much faster using
> skip scan - recursive cte.
>
> Best regards,
>
> depesz
>
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Clive Swan 2021-09-23 13:37:11 RE: Get COUNT results from two different columns
Previous Message Tobias Meyer 2021-09-23 10:49:54 Re: Remove duplicated row in pg_largeobject_metadata