From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru> |
Subject: | Re: MergeJoin beats HashJoin in the case of multiple hash clauses |
Date: | 2025-03-05 02:43:28 |
Message-ID: | CAPpHfduyFdZMuCi_qd11Xuo44oaLByZgiPofVDXC-MERMSCWVg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Mar 3, 2025 at 10:24 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> On 17/2/2025 01:34, Alexander Korotkov wrote:
> > Hi, Andrei!
> >
> > On Tue, Oct 8, 2024 at 8:00 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> > Thank you for your work on this subject. I agree with the general
> > direction. While everyone has used conservative estimates for a long
> > time, it's better to change them only when we're sure about it.
> > However, I'm still not sure I get the conservatism.
> >
> > if (innerbucketsize > thisbucketsize)
> > innerbucketsize = thisbucketsize;
> > if (innermcvfreq > thismcvfreq)
> > innermcvfreq = thismcvfreq;
> >
> > IFAICS, even in the worst case (all columns are totally correlated),
> > the overall bucket size should be the smallest bucket size among
> > clauses (not the largest). And the same is true of MCV. As a mental
> > experiment, we can add a new clause to hash join, which is always true
> > because columns on both sides have the same value. In fact, it would
> > have almost no influence except for the cost of extracting additional
> > columns and the cost of executing additional operators. But in the
> > current model, this additional clause would completely ruin
> > thisbucketsize and thismcvfreq, making hash join extremely
> > unappealing. Should we still revise this to calculate minimum instead
> > of maximum?
> I agree with your point. But I think the code works precisely the way
> you have described.
You're right. I just messed up with the sides of comparison operator.
------
Regards,
Alexander Korotkov
Supabase
From | Date | Subject | |
---|---|---|---|
Next Message | jian he | 2025-03-05 02:56:42 | support fast default for domain with constraints |
Previous Message | Andreas Karlsson | 2025-03-05 02:32:57 | Re: INSERT ... ON CONFLICT DO SELECT [FOR ...] take 2 |