Re: Potential performance issues related to group by and covering index

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>
Cc: "Liu, Xinyu" <liuxy(at)gatech(dot)edu>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Potential performance issues related to group by and covering index
Date: 2021-03-02 21:41:24
Message-ID: CAApHDvps6RVbRf4pSkwX70dXfegiE5LZt1iY5Q16jnFRc9xBbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 3 Mar 2021 at 10:04, Michael Lewis <mlewis(at)entrata(dot)com> wrote:
> Are there guidelines or principles you could share about writing the group by clause such that it is more efficient?

If you have the option of writing them in the same order as an
existing btree index that covers the entire GROUP BY clause (in
version < PG13) or at least prefix of the GROUP BY clause (version >=
PG13), then the planner has a chance to make use of that index to
provide pre-sorted input to do group aggregate.

Since PG13 has Incremental Sort, having an index that covers only a
prefix of the GROUP BY clause may still help.

If no indexes exist then you might get better performance by putting
the most distinct column first. That's because sorts don't need to
compare the remaining columns once it receives two different values
for one column. That gets more complex when the most distinct column
is wider than the others. e.g a text compare is more expensive than
comparing two ints. For Hash Aggregate, I don't think the order will
matter much.

David

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Nagaraj Raj 2021-03-03 01:20:51 tables meta data collection
Previous Message Michael Lewis 2021-03-02 21:04:24 Re: Potential performance issues related to group by and covering index