Re: Question of Parallel Hash Join on TPC-H Benchmark

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Ba Jinsheng <bajinsheng(at)u(dot)nus(dot)edu>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Zhang Mingli <zmlpostgres(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Question of Parallel Hash Join on TPC-H Benchmark
Date: 2024-10-13 03:09:29
Message-ID: 59f605ce-00f2-401b-be1e-5b684326ab6e@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 13/10/2024 01:11, Ba Jinsheng wrote:
> I changed the code to generate an efficient query plan (only the
> HashJoin in fifth line is in parallel), so I am wondering whether it is
> possible to optimize the code to enable this efficient query plan in
> default? I believe at least, it would improve the performance of
> PostgreSQL on the standard benchmark TPC-H.
> If you need, I can provide my environment in docker for your analysis.
Thanks for the case! It was a pretty exciting case.

The answer to your question is simple: we have a correlated subquery
here with the clause: 'p_partkey = ps_partkey'. Because of that, the top
JOIN operator has parameterised inner, and parallel workers can't be
used according to the current state of the code. See comments in the code:

/*
* If the inner path is parameterised, we can't use a partial hashjoin.
* Parameterised partial paths are not supported. The caller should
* already have verified that no lateral rels are required here.
*/

Even if you transform the subquery to SEMI JOIN, you will have a
parameterised join, and the parallel plan will be declined.
So, the best you can do here is replace clause
'ps_supplycost = (SELECT ...)' with 'ps_supplycost IN (SELECT ...)'
I got quite a beneficial speedup there (see attached).

So, what can we improve here?
- I am suspicious about the parallel plans for parameterised paths, at
least soon.
- Improve pull-ups for subqueries. Especially if we can prove that the
subquery has a single aggregate returning only one tuple. It looks doable.

--
regards, Andrei Lepikhov

Attachment Content-Type Size
final_explain.txt text/plain 6.6 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2024-10-13 07:23:44 BUG #18653: Is it necessary for ATExecDropInherit to acquire an AccessExclusiveLock on the parent table?
Previous Message Andrei Lepikhov 2024-10-13 00:01:29 Re: Question of Parallel Hash Join on TPC-H Benchmark