Quick Links

Re: Custom Scan APIs (Re: Custom Plan node)

From:	Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject:	Re: Custom Scan APIs (Re: Custom Plan node)
Date:	2014-03-04 15:00:53
Message-ID:	CADyhKSVciCjNpReNK-mRR=PE1Bi0P0WTBZZy5jCsUAcy1uqKhA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

2014-03-04 23:10 GMT+09:00 Stephen Frost <sfrost(at)snowman(dot)net>:
>> The "cache_scan" module that I and Haribabu are discussing in another
>> thread also might be a good demonstration for custom-scan interface,
>> however, its code scale is a bit larger than ctidscan.
>
> That does sound interesting though I'm curious about the specifics...
>
This module caches a part of columns, but not all, thus allows to hold
much larger number of records for a particular amount of RAM than the
standard buffer cache.
It is constructed on top of custom-scan node, and also performs a new
hook for a callback on page vacuuming to invalidate its cache entry.
(I originally designed this module for demonstration of on-vacuum hook
because I already made ctidscan and postgres_fdw enhancement for
custom-scan node, by the way.)

>> > For one thing, an example where you could have this CustomScan node calling
>> > other nodes underneath would be interesting. I realize the CTID scan can't
>> > do that directly but I would think your GPU-based system could; after all,
>> > if you're running a join or an aggregate with the GPU, the rows could come
>> > from nearly anything. Have you considered that, or is the expectation that
>> > users will just go off and access the heap and/or whatever indexes directly,
>> > like ctidscan does? How would such a requirement be handled?
>> >
>> In case when custom-scan node has underlying nodes, it shall be invoked using
>> ExecProcNode as built-in node doing, then it will be able to fetch tuples
>> come from underlying nodes. Of course, custom-scan provider can perform the
>> tuples come from somewhere as if it came from underlying relation. It is
>> responsibility of extension module. In some cases, it shall be required to
>> return junk system attribute, like ctid, for row-level locks or table updating.
>> It is also responsibility of the extension module (or, should not add custom-
>> path if this custom-scan provider cannot perform as required).
>
> Right, tons of work to do to make it all fit together and play nice-
> what I was trying to get at is: has this actually been done? Is the GPU
> extension that you're talking about as the use-case for this been
> written?
>
Its chicken-and-egg problem, because implementation of the extension module
fully depends on the interface from the backend. Unlike commit-fest, here is no
deadline for my extension module, so I put higher priority on the submission of
custom-scan node, than the extension.
However, GPU extension is not fully theoretical stuff. I had implemented
a prototype using FDW APIs, and it allowed to accelerate sequential scan if
query has enough complicated qualifiers.

See the movie (from 2:45). The table t1 is a regular table, and t2 is a foreign
table. Both of them has same contents, however, response time of the query
is much faster, if GPU acceleration is working.
http://www.youtube.com/watch?v=xrUBffs9aJ0
So, I'm confident that GPU acceleration will have performance gain once it
can run regular tables, not only foreign tables.

> How does it handle all of the above? Or are we going through
> all these gyrations in vain hope that it'll actually all work when
> someone tries to use it for something real?
>
I don't talk something difficult. If junk attribute requires to return "ctid" of
the tuple, custom-scan provider reads a tuple of underlying relation then
includes a correct item pointer. If this custom-scan is designed to run on
the cache, all it needs to do is reconstruct a tuple with correct item-pointer
(thus this cache needs to have ctid also). It's all I did in the cache_scan
module.

Thanks,
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Re: Custom Scan APIs (Re: Custom Plan node) at 2014-03-04 14:10:41 from Stephen Frost

Responses

Re: Custom Scan APIs (Re: Custom Plan node) at 2014-03-04 19:34:54 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2014-03-04 15:06:54	Re: plpgsql.warn_shadow
Previous Message	Yuri Levinsky	2014-03-04 14:53:49	Re: requested shared memory size overflows size_t