Re: Custom Plan node

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Custom Plan node
Date: 2013-09-07 15:21:31
Message-ID: CADyhKSWL=XBomL2q=CwtRMoTQzJHwVfFk9QXax8GshLg=Y_JNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2013/9/7 David Fetter <david(at)fetter(dot)org>:
> On Sat, Sep 07, 2013 at 02:49:54PM +0200, Kohei KaiGai wrote:
>> 2013/9/7 Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>:
>> > 2013/9/7 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>> >> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> >>> I find this a somewhat depressing response. Didn't we discuss this
>> >>> exact design at the developer meeting in Ottawa? I thought it sounded
>> >>> reasonable to you then, or at least I don't remember you panning it.
>> >>
>> >> What I recall saying is that I didn't see how the planner side of it would
>> >> work ... and I still don't see that. I'd be okay with committing
>> >> executor-side fixes only if we had a vision of where we'd go on the
>> >> planner side; but this patch doesn't offer any path forward there.
>> >>
>> > The reason why this patch stick on executor-side is we concluded
>> > not to patch the planner code from the beginning in Ottawa because
>> > of its complexity.
>> > I'd also like to agree that planner support for custom plan is helpful
>> > to construct better execution plan, however, it also make sense even
>> > if this feature begins a functionality that offers a way to arrange a plan
>> > tree being already constructed.
>> >
>> > Anyway, let me investigate what's kind of APIs to be added for planner
>> > stage also.
>> >
>> It is a brief idea to add planner support on custom node, if we need it
>> from the beginning. Of course, it is not still detailed considered and
>> needs much brushing up, however, it may be a draft to implement this
>> feature.
>>
>> We may be able to categorize plan node types into three; scan, join
>> and others.
>>
>> Even though planner tries to test various combinations of join and scan
>> to minimize its estimated cost, we have less options on other types
>> like T_Agg and so on. It seems to me the other types are almost put
>> according to the query's form, so it does not make a big problem even
>> if all we can do is manipulation of plan-tree at planner_hook.
>> That is similar to what proposed patch is doing.
>>
>> So, let's focus on join and scan. It needs to give extensions a chance
>> to override built-in path if they can offer more cheap path.
>> It leads an API that allows to add alternative paths when built-in feature
>> is constructing candidate paths. Once path was added, we can compare
>> them according to the estimated cost.
>> For example, let's assume a query tries to join foreign table A and B
>> managed by same postgres_fdw server, remote join likely has cheaper
>> cost than local join. If extension has a capability to handle the case
>> correctly, it may be able to add an alternative "custom-join" path with
>> cheaper-cost.
>> Then, this path shall be transformed to "CustomJoin" node that launches
>> a query to get a result of A join B being remotely joined.
>> In this case, here is no matter even if "CustomJoin" has underlying
>> ForeignScan nodes on the left-/right-tree, because extension can handle
>> the things to do with its arbitrary.
>>
>> So, the following APIs may be needed for planner support, at least.
>>
>> * API to add an alternative join path, in addition to built-in join logic.
>> * API to add an alternative scan path, in addition to built-in scan logic.
>> * API to construct "CustomJoin" according to the related path.
>> * API to construct "CustomScan" according to the related path.
>>
>> Any comment please.
>
> The broad outlines look great.
>
> Do we have any way, at least conceptually, to consider the graph of
> the cluster with edges weighted by network bandwidth and latency?
>
As postgres_fdw is now doing?
Its configuration allows to add cost to connect remote server as startup
cost, and also add cost to transfer data on network being multiplexed
with estimated number of rows, according to per-server configuration.
I think it is responsibility of the custom plan provider, and fully depends
on the nature of what does it want to provide.

Thanks,
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gilles Darold 2013-09-07 15:31:52 Re: review: psql and pset without any arguments
Previous Message Bruce Momjian 2013-09-07 14:59:08 Re: strange IS NULL behaviour