Re: Custom Scan APIs (Re: Custom Plan node)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, "Jim Mlodgenski" <jimmy76(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Subject: Re: Custom Scan APIs (Re: Custom Plan node)
Date: 2014-02-26 08:31:58
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8F7FB83@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> * Kouhei Kaigai (kaigai(at)ak(dot)jp(dot)nec(dot)com) wrote:
> > This regular one means usual tables. Even though custom implementation
> > may reference self-managed in-memory cache instead of raw heap, the
> > table pointed in user's query shall be a usual table.
> > In the past, Hanada-san had proposed an enhancement of FDW to support
> > remote-join but eventually rejected.
>
> I'm not aware of the specifics around that proposal but I don't believe
> we, as a community, have decided to reject the idea in general.
>
IIUC, his approach was integration of join-pushdown within FDW APIs,
however, it does not mean the idea of remote-join is rejected.
I believe it is still one of our killer feature if we can revise the
implementation.

Hanada-san, could you put the reason why your proposition was rejected
before?

> > I thought these functions were useful to have in the backend commonly,
> > but is not a fundamental functionality lacks of the custom-scan interface.
>
> Then perhaps they should be exposed more directly? I can understand
> generally useful functionality being exposed in a way that anyone can use
> it, but we need to avoid interfaces which can't be stable due to normal
> / ongoing changes to the backend code.
>
The functions my patches want to expose are:
- get_restriction_qual_cost()
- fix_expr_common()

And, the functions my patches newly want are:
- bms_to_string()
- bms_from_string()

Above two functions are defined as static functions because cost estimation
is done at costsize.c and set-reference is done at setrefs.c, however,
custom-scan breaks this assumption, so I moved it into public.
These are used by everyone, but everyone exists on a particular file.

> > I can also understand the usefulness of join or aggregation into the
> > remote side in case of foreign table reference. In similar way, it is
> > also useful if we can push these CPU intensive operations into
> > co-processors on regular table references.
>
> That's fine, if we can get data to and from those co-processors efficiently
> enough that it's worth doing so. If moving the data to the GPU's memory
> will take longer than running the actual aggregation, then it doesn't make
> any sense for regular tables because then we'd have to cache the data in
> the GPU's memory in some way across multiple queries, which isn't something
> we're set up to do.
>
When I made a prototype implementation on top of FDW, using CUDA, it enabled
to run sequential scan 10 times faster than SeqScan on regular tables, if
qualifiers are enough complex.
Library to communicate GPU (OpenCL/CUDA) has asynchronous data transfer
mode using hardware DMA. It allows to hide the cost of data transfer by
pipelining, if here is enough number of records to be transferred.
Also, the recent trend of semiconductor device is GPU integration with CPU,
that shares a common memory space. See, Haswell of Intel, Kaveri of AMD, or
Tegra K1 of nvidia. All of them shares same memory, so no need to transfer
the data to be calculated. This trend is dominated by physical law because
of energy consumption by semiconductor. So, I'm optimistic for my idea.

> > As I mentioned above, the backend changes by the part-2/-3 patches are
> > just minor stuff, and I thought it should not be implemented by
> > contrib module locally.
>
> Fine- then propose them as generally useful additions, not as patches which
> are supposed to just be for contrib modules using an already defined
> interface. If you can make a case for that then perhaps this is more
> practical.
>
The usage was found by the contrib module that wants to call static
functions, or feature to translate existing data structure to/from
cstring. But, anyway, does separated patch make sense?

> > No. What I want to implement is, read the regular table and transfer
> > the contents into GPU's local memory for calculation, then receives
> > its calculation result. The in-memory cache (also I'm working on) is
> > supplemental stuff because disk access is much slower and row-oriented
> > data structure is not suitable for SIMD style instructions.
>
> Is that actually performant? Is it actually faster than processing the
> data directly? The discussions that I've had with folks have cast a great
> deal of doubt in my mind about just how well that kind of quick turn-around
> to the GPU's memory actually works.
>
See above.

> > > This really strikes me as the wrong approach for an FDW
> > > join-pushdown API, which should be geared around giving the remote
> > > side an opportunity on a case-by-case basis to cost out joins using
> > > whatever methods it has available to implement them. I've outlined
> > > above the reasons I don't agree with just making the entire
> planner/optimizer pluggable.
> > >
> > I'm also inclined to have arguments that will provide enough
> > information for extensions to determine the best path for them.
>
> For join push-down, I proposed above that we have an interface to the FDW
> which allows us to ask it how much each join of the tables which are on
> a given FDW's server would cost if the FDW did it vs. pulling it back and
> doing it locally. We could also pass all of the relations to the FDW with
> the various join-quals and try to get an answer to everything, but I'm afraid
> that'd simply end up duplicating the logic of the optimizer into every FDW,
> which would be counter-productive.
>
Hmm... It seems to me we should follow the existing manner to construct
join path, rather than special handling. Even if a query contains three or
more foreign tables managed by same server, it shall be consolidated into
one remote join as long as its cost is less than local ones.
So, I'd like to bed using the new add_join_path_hook to compute possible
join path. If remote join implemented by custom-scan is cheaper than local
join, it shall be chosen, then optimizer will try joining with other foreign
tables with this custom-scan node. If remote-join is still cheap, then it
shall be consolidated again.

> Admittedly, getting the costing right isn't easy either, but it's not clear
> to me how it'd make sense for the local server to be doing costing for remote
> servers.
>
Right now, I ignored the cost to run remote-server, focused on the cost to
transfer via network. It might be an idea to discount the CPU cost of remote
execution.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christian Kruse 2014-02-26 08:35:30 Re: [PATCH] Use MAP_HUGETLB where supported (v3)
Previous Message Dimitri Fontaine 2014-02-26 08:16:51 Re: extension_control_path