Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

From: Илья Жарков <izharkov1243(at)gmail(dot)com>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, p(dot)petrov(at)postgrespro(dot)ru
Subject: Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table
Date: 2024-12-08 15:24:21
Message-ID: CAKE=rqQS_hT6DV54Vq8bRAFV=A-Q3PdM9pmqCR_D61PPEcjv8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Regarding merge joins, I suppose in some edge cases inner set scan might
not even be started.

FROM parent p
LEFT JOIN child c
ON p.id = c.id
AND p.dtype = 'B'

┌───┐ ┌───┐
parent │1,A│ │2,A│
└───┘ └───┘
^
┌───┐ ┌───┐
child │ 1 │ │ 2 │
└───┘ └───┘
^

If p.dtype = 'B' was evaluated early, the pointer could move through the
outer set as long as it is evaluated to false.
In the above example, it reaches the end without even accessing the inner
set.

┌───┐ ┌───┐
parent │1,A│ │2,A│
└───┘ └───┘
^
┌───┐ ┌───┐
child │ 1 │ │ 2 │
└───┘ └───┘
^

In the opposite scenario:

┌───┐ ┌───┐
parent │1,B│ │2,A│
└───┘ └───┘
^
┌───┐ ┌───┐
child │ 1 │ │ 2 │
└───┘ └───┘
^

it would need to start moving the inner pointer at some point.

┌───┐ ┌───┐
parent │1,B│ │2,A│
└───┘ └───┘
^
┌───┐ ┌───┐
child │ 1 │ │ 2 │
└───┘ └───┘
^

But even in this case, the pointer may not reach the end in the inner set.

Though this highly depends on how merge join is implemented in the Postgres
code. I have to admit that I have a very vague idea on this...

вс, 8 дек. 2024 г. в 11:44, Andrei Lepikhov <lepihov(at)gmail(dot)com>:

> On 8/12/2024 09:52, Andres Freund wrote:
> >> I think avoiding touching a hash table and an index under MergeJoin can
> also
> >> be beneficial.
> >
> > How would you get significant wins for mergejoins? You need to go
> through both
> > inner and outer anyway?
> In my mind, this trick can be designed for specific cases like sales
> tables, as illustrated before and used by well-rounded developers. I'm
> not sure that such optimisation would be profitable in general. My point
> is that the sales database has lots of categories, and when requesting
> product descriptions, we will not necessarily touch all the categories -
> in that case, the one-sided clause could allow us to avoid scanning some
> tables at all. Am I wrong?
> BTW, may it be used in SEMI JOIN cases?
>
> --
> regards, Andrei Lepikhov
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-12-08 16:37:37 Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table
Previous Message Andrew Dunstan 2024-12-08 14:57:57 Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays