From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: An improvement on parallel DISTINCT |
Date: | 2024-02-02 10:39:25 |
Message-ID: | CAApHDvrcDvMpAraQ15TzANgKed9pREBCZ+qXLiuKrfHPf_jhaA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 2 Feb 2024 at 20:47, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
>
>
> On Fri, Feb 2, 2024 at 11:26 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>>
>> In light of this, do you still think it's worthwhile making this change?
>>
>> For me, I think all it's going to result in is extra planner work
>> without any performance gains.
>
>
> Hmm, with the query below, I can see that the new plan is cheaper than
> the old plan, and the cost difference exceeds STD_FUZZ_FACTOR.
>
> create table t (a int, b int);
> insert into t select i%100000, i from generate_series(1,10000000)i;
> analyze t;
>
> explain (costs on) select distinct a from t order by a limit 1;
OK, a LIMIT clause... I didn't think of that. Given the test results
below, I'm pretty convinced we should make the change.
Performance testing on an AMD 3990x with work_mem=4MB and hash_mem_multiplier=2.
$ cat bench.sql
select distinct a from t order by a limit 1;
$ pgbench -n -T 60 -f bench.sql postgres
-- Master
max_parallel_workers_per_gather=2;
latency average = 470.310 ms
latency average = 468.673 ms
latency average = 469.463 ms
max_parallel_workers_per_gather=4;
latency average = 346.012 ms
latency average = 346.662 ms
latency average = 347.591 ms
max_parallel_workers_per_gather=8; + alter table t set (parallel_workers=8);
latency average = 300.298 ms
latency average = 300.029 ms
latency average = 300.314 ms
-- Patched
max_parallel_workers_per_gather=2;
latency average = 424.176 ms
latency average = 431.870 ms
latency average = 431.870 ms (9.36% faster than master)
max_parallel_workers_per_gather=4;
latency average = 279.837 ms
latency average = 280.893 ms
latency average = 281.518 ms (23.51% faster than master)
max_parallel_workers_per_gather=8; + alter table t set (parallel_workers=8);
latency average = 178.585 ms
latency average = 178.780 ms
latency average = 179.768 ms (67.68% faster than master)
So the gains increase with more parallel workers due to pushing more
work to the worker. Amdahl's law approves of this.
I'll push the patch shortly.
David
From | Date | Subject | |
---|---|---|---|
Next Message | Ranier Vilela | 2024-02-02 11:20:30 | Re: Avoid unncessary always true test (src/backend/storage/buffer/bufmgr.c) |
Previous Message | Thomas Munro | 2024-02-02 10:18:18 | InstallXLogFileSegment() vs concurrent WAL flush |