Quick Links

Re: Potential performance issues related to group by and covering index

From:	Hannu Krosing <hannuk(at)google(dot)com>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Michael Lewis <mlewis(at)entrata(dot)com>, "Liu, Xinyu" <liuxy(at)gatech(dot)edu>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Potential performance issues related to group by and covering index
Date:	2021-03-05 01:14:40
Message-ID:	CAMT0RQQH0YjgO3fEOn6GMx5hvezYS97+kv59bGb=xCCs5DMtvw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

In the original example it looks like using the index (and not running
a parallel query) is what made the query slow

The fast version was brute-force sequscan(s) + sort with 3 parallel
backends (leader + 2 workers) sharing the work.

On Tue, Mar 2, 2021 at 10:42 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Wed, 3 Mar 2021 at 10:04, Michael Lewis <mlewis(at)entrata(dot)com> wrote:
> > Are there guidelines or principles you could share about writing the group by clause such that it is more efficient?
>
> If you have the option of writing them in the same order as an
> existing btree index that covers the entire GROUP BY clause (in
> version < PG13) or at least prefix of the GROUP BY clause (version >=
> PG13), then the planner has a chance to make use of that index to
> provide pre-sorted input to do group aggregate.
>
> Since PG13 has Incremental Sort, having an index that covers only a
> prefix of the GROUP BY clause may still help.
>
> If no indexes exist then you might get better performance by putting
> the most distinct column first. That's because sorts don't need to
> compare the remaining columns once it receives two different values
> for one column. That gets more complex when the most distinct column
> is wider than the others. e.g a text compare is more expensive than
> comparing two ints. For Hash Aggregate, I don't think the order will
> matter much.
>
> David
>
>

In response to

Re: Potential performance issues related to group by and covering index at 2021-03-02 21:41:24 from David Rowley

Browse pgsql-performance by date

	From	Date	Subject
Next Message	val.janeiko	2021-03-05 17:55:37	Slow query performance inside a transaction on a clean database
Previous Message	Justin Pryzby	2021-03-03 04:20:23	Re: tables meta data collection