Re: Potential performance issues related to group by and covering index

From: Hannu Krosing <hannuk(at)google(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Michael Lewis <mlewis(at)entrata(dot)com>, "Liu, Xinyu" <liuxy(at)gatech(dot)edu>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Potential performance issues related to group by and covering index
Date: 2021-03-05 01:14:40
Message-ID: CAMT0RQQH0YjgO3fEOn6GMx5hvezYS97+kv59bGb=xCCs5DMtvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

In the original example it looks like using the index (and not running
a parallel query) is what made the query slow

The fast version was brute-force sequscan(s) + sort with 3 parallel
backends (leader + 2 workers) sharing the work.

On Tue, Mar 2, 2021 at 10:42 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Wed, 3 Mar 2021 at 10:04, Michael Lewis <mlewis(at)entrata(dot)com> wrote:
> > Are there guidelines or principles you could share about writing the group by clause such that it is more efficient?
>
> If you have the option of writing them in the same order as an
> existing btree index that covers the entire GROUP BY clause (in
> version < PG13) or at least prefix of the GROUP BY clause (version >=
> PG13), then the planner has a chance to make use of that index to
> provide pre-sorted input to do group aggregate.
>
> Since PG13 has Incremental Sort, having an index that covers only a
> prefix of the GROUP BY clause may still help.
>
> If no indexes exist then you might get better performance by putting
> the most distinct column first. That's because sorts don't need to
> compare the remaining columns once it receives two different values
> for one column. That gets more complex when the most distinct column
> is wider than the others. e.g a text compare is more expensive than
> comparing two ints. For Hash Aggregate, I don't think the order will
> matter much.
>
> David
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message val.janeiko 2021-03-05 17:55:37 Slow query performance inside a transaction on a clean database
Previous Message Justin Pryzby 2021-03-03 04:20:23 Re: tables meta data collection