Quick Links

Re: Faster distinct query?

From:	Michael Lewis <mlewis(at)entrata(dot)com>
To:	Israel Brewster <ijbrewster(at)alaska(dot)edu>
Cc:	PostgreSQL Mailing Lists <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Faster distinct query?
Date:	2021-09-22 20:20:46
Message-ID:	CAHOFxGrvxWFTxNQZNe-cWgf__5ErEZ5JUoFzHNXGGnK9=oqX-A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

In the future, please share the plan returned by explain analyze, and some
data about how many rows in the involved tables, what type of system you
are running it on, any changes from default configs, etc.

How does the below work? It should do a very simple index scan only, then
aggregate the relative few rows after the fact.

select station, array_agg(distinct(channel)) as channels
FROM(
SELECT station,channel FROM data GROUP BY station,channel
) AS sub
group by station;

If there is correlation between station & channel, then you might look at
creating a multivariate statistics object and analyzing the table so the
planner can make better choices, knowing that channel is dependent on
station perhaps. I wouldn't necessarily think that it would help this
query, but perhaps others. Also, you might try creating only dependencies,
only ndistinct type, or some combination other than all 3 types.

ref- https://www.postgresql.org/docs/current/sql-createstatistics.html

In response to

Faster distinct query? at 2021-09-22 20:05:22 from Israel Brewster

Responses

Re: Faster distinct query? at 2021-09-22 20:26:55 from David G. Johnston
Re: Faster distinct query? at 2021-09-23 00:58:31 from David Rowley

Browse pgsql-general by date

	From	Date	Subject
Next Message	David G. Johnston	2021-09-22 20:26:55	Re: Faster distinct query?
Previous Message	David G. Johnston	2021-09-22 20:20:28	Re: Faster distinct query?