Proposal for Merge Join for Non '=' Operators

From: Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Proposal for Merge Join for Non '=' Operators
Date: 2014-04-09 05:24:22
Message-ID: 4205E661176A124FAF891E0A6BA913526595F303@SZXEML507-MBS.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I would like to propose a New merge join algorithm for optimizing non '=' operators. ('<', '<=', '>', '>=')

- Currently Merge join is only supported for '=' operator. For '<' or '>' operator it chooses Nest Loop Join, or Nest loop with materialization.

- I think when tuple from lower node is sorted or sorting cost is very less, then we can use Merge Join for Non Equal operator also and which will give better performance than NLJ (for selecting this new cost calculation can be implemented in planner).

Example for using merge Join for < operator.

T1 T2

3 1

4 2

5 4

Outer tuple (3) need to be compared with inner tuple one by one, so it will satisfy condition at third inner tuple (as 3<4). So here we can save this point of inner tuple so that next outer tuple can directly start comparison from this tuple.

1. In this algorithm we can put one more optimization: Once outer tuple satisfies the Merge QUAL it can skip the Merge QUAL test with remaining inner tuple and directly apply Other QUALs, as merge QUAL will always satisfy for remaining tuples.

Implementation Detail:

1. Need to add new cost calculation mechanism for this. I still have to work on this part.

2. Implementing in Executor

a. This algorithm is almost same as normal merge Join with some changes.

b. Both Inner and Outer Data Sources should be sorted, same as normal merge Join.
ALGORITHM:
Merge Qual (R.A < Q.A)
r = first tuple from R (Outer Relation)
q = first tuple in Q( Inner Relation)
save_pos = q; /* Position to start scanning in relation Q*/
While (fetch tuple r from R till relation end)
{
for each tuple q in Q starting from save_pos
{
Merge Qual Satisfy
{
save_pos = q;
Consume all subsequent tuples and project(just need to match Other Quals if any.)
}
Else
Fetch Next tuple from Q;
}
}

- Performance Comparison:
Suppose tuples of inner and outer is already sorted or Index scan on inner and outer.

* Then cost of NLJ is always O (r*q).

* The cost of this MJ will be b/w: O (n) to O (r*q).

Where r is number of tuple in R (outer relation) and q is number of tuple in Q (inner Relation).

Please provide your feedback/suggestions.

Thanks & Regards,
Dilip Kumar

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rajeev rastogi 2014-04-09 05:33:41 Re: Autonomous Transaction (WIP)
Previous Message Rajeev rastogi 2014-04-09 05:05:53 Re: Autonomous Transaction (WIP)