From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Range Merge Join v1 |
Date: | 2017-09-17 17:24:34 |
Message-ID: | CAMp0ubdY_6k11V_u4AHDPOySNbuTmM=kasNj90v5jyM3fa-nSA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Aug 31, 2017 at 1:52 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> Updated patch attached. Changelog:
>
> * Rebased
> * Changed MJCompare to return an enum as suggested, but it has 4
> possible values rather than 3.
> * Added support for joining on contains or contained by (@> or <@) and
> updated tests.
The current patch does not work well with multiple keys, and I believe
it's important to solve because it will make it usable for
multi-dimension spatial joins.
The problem is this: the algorithm for a single key demands that the
input is sorted by (lower(r) NULLS FIRST, upper(r) NULLS LAST). That's
easy, because that's also the sort order for the range operator class,
so everything just works.
For multiple keys, the order of the input is:
lower(r1) NULLS FIRST, lower(r2) NULLS FIRST,
upper(r1) NULLS LAST, upper(r2) NULLS LAST
But that can't match up with any opclass, because an opclass can only
order one attribute at a time. In this case, the lower bound of r2 is
more significant than the upper bound of r1.
It's easy enough to adapt the execution to make this work as long as
the input is properly sorted. The challenge is making the optimizer
choose the sort keys properly.
I have tried a few approaches. The current approach that I'm using is:
* have a new, special range operator family with no operator classes
* in check_mergejoinable(), detect the && operator and set the
mergeopfamilies to contain only the special operator family
* in try_mergejoin_path(), it will convert the pathkeys using that
operator class into pathkeys over a "lower" expression over the same
EC; and then add on additional pathkeys for the "upper" expressions
onto the end of the pathkey list
Any comments or alternative suggestions welcome. This will probably
take a few days at least, so I put the patch in "waiting on author"
state.
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Dmitriy Sarafannikov | 2017-09-17 17:43:09 | Improving DISTINCT with LooseScan node |
Previous Message | Chris Travers | 2017-09-17 16:48:58 | Re: Add Roman numeral conversion to to_number |