Quick Links

MergeJoin beats HashJoin in the case of multiple hash clauses

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	MergeJoin beats HashJoin in the case of multiple hash clauses
Date:	2023-06-15 08:30:10
Message-ID:	52257607-57f6-850d-399a-ec33a654457b@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi, all.

Some of my clients use JOIN's with three - four clauses. Quite
frequently, I see complaints on unreasonable switch of JOIN algorithm to
Merge Join instead of Hash Join. Quick research have shown one weak
place - estimation of an average bucket size in final_cost_hashjoin (see
q2.sql in attachment) with very conservative strategy.
Unlike estimation of groups, here we use smallest ndistinct value across
all buckets instead of multiplying them (or trying to make multivariate
analysis).
It works fine for the case of one clause. But if we have many clauses,
and if each has high value of ndistinct, we will overestimate average
size of a bucket and, as a result, prefer to use Merge Join. As the
example in attachment shows, it leads to worse plan than possible,
sometimes drastically worse.
I assume, this is done with fear of functional dependencies between hash
clause components. But as for me, here we should go the same way, as
estimation of groups.
The attached patch shows a sketch of the solution.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
q2.sql	application/sql	1.9 KB
fix_bucketsize.diff	text/plain	674 bytes

Responses

Re: MergeJoin beats HashJoin in the case of multiple hash clauses at 2023-06-28 13:53:06 from Alena Rybakina
Re: MergeJoin beats HashJoin in the case of multiple hash clauses at 2023-09-11 04:51:03 from Andy Fan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Masahiko Sawada	2023-06-15 08:32:22	Re: subscription/033_run_as_table_owner is not listed in the meson.build
Previous Message	Vladimir Churyukin	2023-06-15 08:16:31	Re: Bypassing shared_buffers