Re: Custom Scan APIs (Re: Custom Plan node)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Subject: Re: Custom Scan APIs (Re: Custom Plan node)
Date: 2014-01-27 09:16:01
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8F6EDCE@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hackers,

Is somebody available to volunteer to review the custom-scan patch?

Even though Hanada-san acknowledged before, it seems to me this patch
has potentially arguable implementations. Even if you have enough time
to review whole of the code, it helps me if you can comment on the
following topics.

(1) Interface to add alternative paths instead of built-in join paths

This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow
extensions to provide alternative scan path in addition to the built-in
join paths. Custom-scan path being added is assumed to perform to scan
on a (virtual) relation that is a result set of joining relations.
My concern is its arguments to be pushed. This hook is declared as follows:

/* Hook for plugins to add custom join path, in addition to default ones */
typedef void (*add_join_path_hook_type)(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
JoinType jointype,
SpecialJoinInfo *sjinfo,
List *restrictlist,
List *mergeclause_list,
SemiAntiJoinFactors *semifactors,
Relids param_source_rels,
Relids extra_lateral_rels);
extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

Likely, its arguments upper than restrictlist should be informed to extensions,
because these are also arguments of add_paths_to_joinrel().
However, I'm not 100% certain how about other arguments should be informed.
Probably, it makes sense to inform param_source_rels and extra_lateral_rels
to check whether the path is sensible for parameterized paths.
On the other hand, I doubt whether mergeclause_list is usuful to deliver.
(It may make sense if someone tries to implement their own merge-join
implementation??)

I'd like to seem idea to improve the current interface specification.

(2) CUSTOM_VAR for special Var reference

@@ -134,6 +134,7 @@ typedef struct Expr
#define INNER_VAR 65000 /* reference to inner subplan */
#define OUTER_VAR 65001 /* reference to outer subplan */
#define INDEX_VAR 65002 /* reference to index column */
+#define CUSTOM_VAR 65003 /* reference to custom column */

I newly added CUSTOM_VAR to handle a case when custom-scan override
join relations.
Var-nodes within join plan are adjusted to refer either ecxt_innertuple or
ecxt_outertuple of ExprContext. It makes a trouble if custom-scan runs
instead of built-in joins, because its tuples being fetched are usually
stored on the ecxt_scantuple, thus Var-nodes also need to have right
varno neither inner nor outer.

SetPlanRefCustomScan callback, being kicked on set_plan_refs, allows
extensions to rewrite Var-nodes within custom-scan node to indicate
ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
this node reference the third attribute of the tuple in ecxt_scantuple.
I think it is a reasonable solution, however, I'm not 100% certain
whether people have more graceful idea to implement it.

If you have comments around above two topic, or others, please give
your ideas.

Thanks,

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Kohei KaiGai
> Sent: Tuesday, January 14, 2014 11:20 PM
> To: Shigeru Hanada
> Cc: Kaigai, Kouhei(海外, 浩平); Jim Mlodgenski; Robert Haas; Tom Lane;
> PgHacker; Peter Eisentraut
> Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>
> Hello,
>
> The attached patches are the ones rebased to the latest git tree, but no
> functional changes from the previous revision on the commit-fest:Nov.
> Hanada-san volunteered to review the series of patches, including the
> portion for postgres_fdw, then marked it as "ready for committer" on the
> last commit fest.
> So, I hope someone of committer also volunteer to review the patches for
> final checking.
>
> * Part-1 - CustomScan APIs
> This patch provides a set of interfaces to interact query-optimizer and
> -executor for extensions. The new add_scan_path_hook or add_join_path_hook
> allows to offer alternative ways to scan a particular relation or to join
> a particular relations.
> Then, once the alternative ways are chosen by the optimizer, associated
> callbacks shall be kicked from the executor. In this case, extension has
> responsibility to return a slot that hold a tuple (or empty for end of scan)
> being scanned from the underlying relation.
>
> * Part-2 - contrib/ctidscan
> This patch provides a simple example implementation of CustomScan API.
> It enables to skip pages when inequality operators are given on ctid system
> columns. That is, at least, better than sequential full-scan, so it usually
> wins to SeqScan, but Index-scan is much better.
>
> * Part-3 - remote join implementation
> This patch provides an example to replace a join by a custom scan node that
> runs on a result set of remote join query, on top of existing postgres_fdw
> extension. The idea is, a result set of remote query looks like a relation
> but intangible, thus, it is feasible to replace a local join by a scan on
> the result set of a query executed on the remote host, if both of the relation
> to be joined belongs to the identical foreign server.
> This patch gives postgres_fdw a capability to run a join on the remote host.
>
> Thanks,
>
>
> 2013/12/16 Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>:
> > KaiGai-san,
> >
> > 2013/12/16 KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>:
> >> (2013/12/16 14:15), Shigeru Hanada wrote:
> >>> (1) ctidscan
> >>> Is session_preload_libraries available to enable the feature, like
> >>> shared_*** and local_***? According to my trial it works fine like
> >>> two similar GUCs.
> >>>
> >> It shall be available; nothing different from the two parameters that
> >> we have supported for long time. Sorry, I missed the new feature to
> >> mention about.
> >
> > Check.
> >
> >>> (2) postgres_fdw
> >>> JOIN push--down is a killer application of Custom Scan Provider
> >>> feature, so I think it's good to mention it in the "Remote Query
> >>> Optimization" section.
> >>>
> >> I added an explanation about remote join execution on the section.
> >> Probably, it help users understand why Custom Scan node is here
> >> instead of Join node. Thanks for your suggestion.
> >
> > Check.
> >
> > I think that these patches are enough considered to mark as "Ready for
> > Committer".
> >
> > Regards,
> > --
> > Shigeru HANADA
>
>
>
> --
> KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-01-27 09:27:17 Re: A better way than tweaking NTUP_PER_BUCKET
Previous Message Etsuro Fujita 2014-01-27 08:06:19 Re: inherit support for foreign tables