From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | An improvement on parallel DISTINCT |
Date: | 2023-12-26 11:23:02 |
Message-ID: | CAMbWs48u9VoVOouJsys1qOaC9WVGVmBa+wT1dx8KvxF5GPzezA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
While reviewing Heikki's Omit-junk-columns patchset[1], I noticed that
root->upper_targets[] is used to set target for partial_distinct_rel,
which is not great because root->upper_targets[] is not supposed to be
used by the core code. The comment in grouping_planner() says:
* Save the various upper-rel PathTargets we just computed into
* root->upper_targets[]. The core code doesn't use this, but it
* provides a convenient place for extensions to get at the info.
Then while fixing this issue, I noticed an opportunity for improvement
in how we generate Gather/GatherMerge paths for the two-phase DISTINCT.
The Gather/GatherMerge paths are added by generate_gather_paths(), which
does not consider ordering that might be useful above the GatherMerge
node. This can be improved by using generate_useful_gather_paths()
instead. With this change I can see query plan improvement from the
regression test "select_distinct.sql". For instance,
-- Test parallel DISTINCT
SET parallel_tuple_cost=0;
SET parallel_setup_cost=0;
SET min_parallel_table_scan_size=0;
SET max_parallel_workers_per_gather=2;
-- Ensure we get a parallel plan
EXPLAIN (costs off)
SELECT DISTINCT four FROM tenk1;
-- on master
EXPLAIN (costs off)
SELECT DISTINCT four FROM tenk1;
QUERY PLAN
----------------------------------------------------
Unique
-> Sort
Sort Key: four
-> Gather
Workers Planned: 2
-> HashAggregate
Group Key: four
-> Parallel Seq Scan on tenk1
(8 rows)
-- on patched
EXPLAIN (costs off)
SELECT DISTINCT four FROM tenk1;
QUERY PLAN
----------------------------------------------------
Unique
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: four
-> HashAggregate
Group Key: four
-> Parallel Seq Scan on tenk1
(8 rows)
I believe the second plan is better.
Attached is a patch that includes this change and also eliminates the
usage of root->upper_targets[] in the core code. It also makes some
tweaks for the comment.
Any thoughts?
[1]
https://www.postgresql.org/message-id/flat/2ca5865b-4693-40e5-8f78-f3b45d5378fb%40iki.fi
Thanks
Richard
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Improve-parallel-DISTINCT.patch | application/octet-stream | 4.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrei Lepikhov | 2023-12-26 11:37:01 | Re: POC: GROUP BY optimization |
Previous Message | Zhijie Hou (Fujitsu) | 2023-12-26 11:09:57 | RE: Synchronizing slots from primary to standby |