From: | Zhang Mingli <zmlpostgres(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Should Explain show Parallel Hash node’s total rows? |
Date: | 2023-10-24 14:46:06 |
Message-ID: | 9CC27C1D-8592-4331-8F3B-D98109A48CAF@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, all
Shall we show Parallel Hash node’s total rows of a Parallel-aware HashJoin?
Ex: a non-parallel plan, table simple has 20000 rows.
zml=# explain select count(*) from simple r join simple s using (id);
QUERY PLAN
--------------------------------------------------------------------------------
Aggregate (cost=1309.00..1309.01 rows=1 width=8)
-> Hash Join (cost=617.00..1259.00 rows=20000 width=0)
Hash Cond: (r.id <x-msg://2/r.id> = s.id <x-msg://2/s.id>)
-> Seq Scan on simple r (cost=0.00..367.00 rows=20000 width=4)
-> Hash (cost=367.00..367.00 rows=20000 width=4)
-> Seq Scan on simple s (cost=0.00..367.00 rows=20000 width=4)
(6 rows)
While a parallel-aware plan:
zml=# explain select count(*) from simple r join simple s using (id);
QUERY PLAN
----------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=691.85..691.86 rows=1 width=8)
-> Gather (cost=691.63..691.84 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=691.63..691.64 rows=1 width=8)
-> Parallel Hash Join (cost=354.50..670.80 rows=8333 width=0)
Hash Cond: (r.id <x-msg://2/r.id> = s.id <x-msg://2/s.id>)
-> Parallel Seq Scan on simple r (cost=0.00..250.33 rows=8333 width=4)
-> Parallel Hash (cost=250.33..250.33 rows=8333 width=4)
-> Parallel Seq Scan on simple s (cost=0.00..250.33 rows=8333 width=4)
(9 rows)
When initial_cost_hashjoin(), we undo the parallel division when parallel ware.
It’s reasonable because a shared hash table should have all the data.
And we also take parallel into account for hash plan’s total rows if it’s parallel aware.
```
if (best_path->jpath.path.parallel_aware)
{
hash_plan->plan.parallel_aware = true;
hash_plan->rows_total = best_path->inner_rows_total;
}
```
But the Parallel Hash node of plan shows the same rows with subplan, I’m wandering if it’s more reasonable to show rows_total instead of plan_rows for Parallel Hash nodes?
For this example,
-> Parallel Hash (rows=20000)
-> Parallel Seq Scan on simple s (rows=8333)
Zhang Mingli
HashData https://www.hashdata.xyz
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2023-10-24 14:53:44 | Re: trying again to get incremental backup |
Previous Message | Andrew Dunstan | 2023-10-24 13:53:00 | Re: run pgindent on a regular basis / scripted manner |