Quick Links

Re: BUG #16887: Group by is faster than distinct

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	liuxy(at)gatech(dot)edu, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: BUG #16887: Group by is faster than distinct
Date:	2021-02-23 05:28:18
Message-ID:	CAApHDvrAgN4APYrsoMGoAhps6zsa2SEom5QW+O-ZqEpjggm-6w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On Tue, 23 Feb 2021 at 10:26, PG Bug reporting form
<noreply(at)postgresql(dot)org> wrote:
> Actual Behavior
> We executed both queries on the TPC-H benchmark of scale factor 5: the first
> query takes over 20 seconds, while the second query only takes 6.5 seconds.
> We think the time difference results from different plans selected.
> Specifically, in the first (slow) query, the optimizer decides to not
> parallelize the SCAN and GROUP operations.

> Expected Behavior
> Since these two queries are semantically equivalent, we were hoping that
> PostgreSQL will evaluate them in roughly the same amount of time.

It makes sense that you'd expect this, however, we don't currently
generate parallel plans for DISTINCT queries. So this is more of
something that's yet to be implemented rather than a bug.

When parallel aggregates were added in 9.6, it was quite late in the
cycle and I narrowed the scope to just GROUP BY. DISTINCT was left
behind. I tried to pick that up again several years ago, but I was
encouraged to drop it in favour of other work.

David

In response to

BUG #16887: Group by is faster than distinct at 2021-02-22 21:20:23 from PG Bug reporting form

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Alexander Lakhin	2021-02-23 06:00:00	Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs
Previous Message	Adrian Klaver	2021-02-23 05:23:02	Re: pg_restore - generated column - not populating