Re: MergeJoin beats HashJoin in the case of multiple hash clauses

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
Subject: Re: MergeJoin beats HashJoin in the case of multiple hash clauses
Date: 2025-03-09 12:13:52
Message-ID: CAPpHfdsTnnya8zCAjzTf8Ytw9=4au3o_8WmVpGK-Zn35fSXcyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 5, 2025 at 4:43 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> On Mon, Mar 3, 2025 at 10:24 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> > On 17/2/2025 01:34, Alexander Korotkov wrote:
> > > Hi, Andrei!
> > >
> > > On Tue, Oct 8, 2024 at 8:00 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> > > Thank you for your work on this subject. I agree with the general
> > > direction. While everyone has used conservative estimates for a long
> > > time, it's better to change them only when we're sure about it.
> > > However, I'm still not sure I get the conservatism.
> > >
> > > if (innerbucketsize > thisbucketsize)
> > > innerbucketsize = thisbucketsize;
> > > if (innermcvfreq > thismcvfreq)
> > > innermcvfreq = thismcvfreq;
> > >
> > > IFAICS, even in the worst case (all columns are totally correlated),
> > > the overall bucket size should be the smallest bucket size among
> > > clauses (not the largest). And the same is true of MCV. As a mental
> > > experiment, we can add a new clause to hash join, which is always true
> > > because columns on both sides have the same value. In fact, it would
> > > have almost no influence except for the cost of extracting additional
> > > columns and the cost of executing additional operators. But in the
> > > current model, this additional clause would completely ruin
> > > thisbucketsize and thismcvfreq, making hash join extremely
> > > unappealing. Should we still revise this to calculate minimum instead
> > > of maximum?
> > I agree with your point. But I think the code works precisely the way
> > you have described.
>
> You're right. I just messed up with the sides of comparison operator.

I've revised commit message, comments, formatting etc.
I'm going to push this if no objections.

------
Regards,
Alexander Korotkov
Supabase

Attachment Content-Type Size
v3-0001-Use-extended-stats-for-precise-estimation-of-buck.patch application/octet-stream 12.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michail Nikolaev 2025-03-09 12:44:00 Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?
Previous Message vignesh C 2025-03-09 12:10:41 Re: Commit fest 2025-03