Should Explain show Parallel Hash node’s total rows?

From: Zhang Mingli <zmlpostgres(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Should Explain show Parallel Hash node’s total rows?
Date: 2023-10-24 14:46:06
Message-ID: 9CC27C1D-8592-4331-8F3B-D98109A48CAF@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi, all

Shall we show Parallel Hash node’s total rows of a Parallel-aware HashJoin?

Ex: a non-parallel plan, table simple has 20000 rows.

zml=# explain select count(*) from simple r join simple s using (id);
QUERY PLAN
--------------------------------------------------------------------------------
Aggregate (cost=1309.00..1309.01 rows=1 width=8)
-> Hash Join (cost=617.00..1259.00 rows=20000 width=0)
Hash Cond: (r.id <x-msg://2/r.id> = s.id <x-msg://2/s.id>)
-> Seq Scan on simple r (cost=0.00..367.00 rows=20000 width=4)
-> Hash (cost=367.00..367.00 rows=20000 width=4)
-> Seq Scan on simple s (cost=0.00..367.00 rows=20000 width=4)
(6 rows)

While a parallel-aware plan:

zml=# explain select count(*) from simple r join simple s using (id);
QUERY PLAN
----------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=691.85..691.86 rows=1 width=8)
-> Gather (cost=691.63..691.84 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=691.63..691.64 rows=1 width=8)
-> Parallel Hash Join (cost=354.50..670.80 rows=8333 width=0)
Hash Cond: (r.id <x-msg://2/r.id> = s.id <x-msg://2/s.id>)
-> Parallel Seq Scan on simple r (cost=0.00..250.33 rows=8333 width=4)
-> Parallel Hash (cost=250.33..250.33 rows=8333 width=4)
-> Parallel Seq Scan on simple s (cost=0.00..250.33 rows=8333 width=4)
(9 rows)

When initial_cost_hashjoin(), we undo the parallel division when parallel ware.
It’s reasonable because a shared hash table should have all the data.
And we also take parallel into account for hash plan’s total rows if it’s parallel aware.
```
if (best_path->jpath.path.parallel_aware)
{
hash_plan->plan.parallel_aware = true;
hash_plan->rows_total = best_path->inner_rows_total;
}
```

But the Parallel Hash node of plan shows the same rows with subplan, I’m wandering if it’s more reasonable to show rows_total instead of plan_rows for Parallel Hash nodes?

For this example,
-> Parallel Hash (rows=20000)
-> Parallel Seq Scan on simple s (rows=8333)

Zhang Mingli
HashData https://www.hashdata.xyz

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-10-24 14:53:44 Re: trying again to get incremental backup
Previous Message Andrew Dunstan 2023-10-24 13:53:00 Re: run pgindent on a regular basis / scripted manner