Quick Links

pgsql: Allow parallel DISTINCT

From:	David Rowley <drowley(at)postgresql(dot)org>
To:	pgsql-committers(at)lists(dot)postgresql(dot)org
Subject:	pgsql: Allow parallel DISTINCT
Date:	2021-08-22 11:31:35
Message-ID:	E1mHlhP-0005P0-8X@gemulon.postgresql.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-committers

Allow parallel DISTINCT

We've supported parallel aggregation since e06a38965. At the time, we
didn't quite get around to also adding parallel DISTINCT. So, let's do
that now.

This is implemented by introducing a two-phase DISTINCT. Phase 1 is
performed on parallel workers, rows are made distinct there either by
hashing or by sort/unique. The results from the parallel workers are
combined and the final distinct phase is performed serially to get rid of
any duplicate rows that appear due to combining rows for each of the
parallel workers.

Author: David Rowley
Reviewed-by: Zhihong Yu
Discussion: https://postgr.es/m/CAApHDvrjRxVKwQN0he79xS+9wyotFXL=RmoWqGGO2N45Farpgw@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/22c4e88ebff408acd52e212543a77158bde59e69

Modified Files
--------------
src/backend/optimizer/README | 1 +
src/backend/optimizer/plan/planner.c | 219 ++++++++++++++++++++++----
src/include/nodes/pathnodes.h | 1 +
src/test/regress/expected/select_distinct.out | 67 ++++++++
src/test/regress/sql/select_distinct.sql | 37 +++++
5 files changed, 292 insertions(+), 33 deletions(-)

Browse pgsql-committers by date

	From	Date	Subject
Next Message	David Rowley	2021-08-22 13:45:00	pgsql: Fix broken regression test caused by 22c4e88eb
Previous Message	Tom Lane	2021-08-21 14:22:24	pgsql: Improve error messages about misuse of SELECT INTO.