Re: Custom Scan APIs (Re: Custom Plan node)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Custom Scan APIs (Re: Custom Plan node)
Date: 2014-02-26 07:46:42
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8F7FAF6@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> * Kouhei Kaigai (kaigai(at)ak(dot)jp(dot)nec(dot)com) wrote:
> > Yes, the part-1 patch provides a set of interface portion to interact
> > between the backend code and extension code. Rest of part-2 and part-3
> > portions are contrib modules that implements its feature on top of
> > custom-scan API.
>
> Just to come back to this- the other two "contrib module" patches, at least
> as I read over their initial submission, were *also* patching portions of
> backend code which it was apparently discovered that they needed. That's
> a good bit of my complaint regarding this approach.
>
?? Sorry, are you still negative on the portion of backend patched
by the part-2 and part-3 portion??

> > FDW's join pushing down is one of the valuable use-cases of this
> > interface, but not all. As you might know, my motivation is to
> > implement GPU acceleration feature on top of this interface, that
> > offers alternative way to scan or join relations or potentially sort or
> aggregate.
>
> If you're looking to just use GPU acceleration for improving individual
> queries, I would think that Robert's work around backend workers would be
> a more appropriate way to go, with the ability to move a working set of
> data from shared buffers and on-disk representation of a relation over to
> the GPU's memory, perform the operation, and then copy the results back.
>
The approach is similar to the Robert's work except for GPU adoption,
instead of multicore CPUs. So, I tried to review his work to apply
the facilities on my extension also.

> If that's not possible or effective wrt performance, then I think we need
> to look at managing the external GPU memory as a foreign system through
> an FDW which happens to be updated through triggers or similar. The same
> could potentially be done for memcached systems, etc.
>
I didn't imagine the idea that expose GPU's local memory.
A supplemental stuff for the data load performance I'm planning is just
a cache mechanism besides regular tables.

> "regular" PG tables, just to point out one issue, can be locked on a
> row-by-row basis, and we know exactly where in shared buffers to go hunt
> down the rows. How is that going to work here, if this is both a "regular"
> table and stored off in a GPU's memory across subsequent queries or even
> transactions?
>
It shall be handled "case-by-case" basis, I think. If row-level lock is
required over the table scan, custom-scan node shall return a tuple being
located on the shared buffer, instead of the cached tuples. Of course,
it is an option for custom-scan node to calculate qualifiers by GPU with
cached data and returns tuples identified by ctid of the cached tuples.
Anyway, it is not a significant problem.

> > Right now, I put all the logic to interact CSI and FDW driver on
> > postgres_fdw side, it might be an idea to have common code (like a
> > logic to check whether the both relations to be joined belongs to same
> > foreign server) on the backend side as something like a gateway of them.
>
> Yes, that's what I was suggesting above- we should be asking the FDWs on
> a case-by-case basis how to cost out the join between foreign tables which
> they are responsible for. Asking two different FDWs servers to cost out
> a join between their tables doesn't make any sense to me.
>
OK, I'll move the portion that will be needed commonly for other FDWs into
the backend code.

> > As an aside, what should be the scope of FDW interface?
> > In my understanding, it allows extension to implement "something" on
> > behalf of a particular data structure being declared with CREATE FOREIGN
> TABLE.
>
> That's where it is today, but certainly not our end goal.
>
> > In other words, extension's responsibility is to generate a view of
> "something"
> > according to PostgreSQL' internal data structure, instead of the object
> itself.
>
> The result of the FDW call needs to be something which PG understands and
> can work with, otherwise we wouldn't be able to, say, run PL/pgsql code
> on the result, or pass it into some other aggregate which we decided was
> cheaper to run locally. Being able to push down aggregates to the remote
> side of an FDW certainly fits in quite well with that.
>
Yes. According to the previous discussion around postgres_fdw getting
merged, all we can trust on the remote side are built-in data types,
functions, operators or other stuffs only.

> > On the other hands, custom-scan interface allows extensions to
> > implement alternative methods to scan or join particular relations,
> > but it is not a role to perform as a target being referenced in
> > queries. In other words, it is methods to access objects.
>
> The custom-scan interface still needs to produce "something" according to
> PG's internal data structures, so it's not clear to me where you're going
> with this.
>
The custom-scan node is intended to perform on regular relations, not
only foreign tables. It means a special feature (like GPU acceleration)
can perform transparently for most of existing applications. Usually,
it defines regular tables for their work on installation, not foreign
tables. It is the biggest concern for me.

> > It is natural both features are similar because both of them intends
> > extensions to hook the planner and executor, however, its purpose is
> different.
>
> I disagree as I don't really view FDWs as "hooks". A "hook" is more like
> a trigger- sure, you can modify the data in transit, or throw an error if
> you see an issue, but you don't get to redefine the world and throw out
> what the planner or optimizer knows about the rest of what is going on in
> the query.
>
I might have miswording. Anyway, I want plan nodes that enable extensions
to define its behavior, even though it's similar to ForeignScan, but allows
to perform on regular relations. Also, not only custom-scan and foreign-scan,
any plan nodes work according to the interface to co-work with other nodes,
it is not strange that both of interfaces are similar.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shigeru Hanada 2014-02-26 08:01:45 Re: Custom Scan APIs (Re: Custom Plan node)
Previous Message Simon Riggs 2014-02-26 07:32:45 Re: ALTER TABLE lock strength reduction patch is unsafe