From: | Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> |
---|---|
To: | Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Stephen Frost <sfrost(at)snowman(dot)net>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net> |
Subject: | Re: Custom Scan APIs (Re: Custom Plan node) |
Date: | 2014-03-20 01:45:44 |
Message-ID: | 9A28C8860F777E439AA12E8AEA7694F8F8CC42@BPXM15GP.gisp.nec.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
> > * Definition of add_join_path_hook
> >
> > I didn't have idea to improve the definition and location of this
> > hook, so it is still on the tail of the add_paths_to_joinrel().
> > Its definition was a bit adjusted according to the feedback on the
> > pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
> > from the argument list. Indeed, these are specific to the built-in
> > MergeJoin logic and easy to reproduce.
> >
>
After the submission, I'm still investigating better way to put a hook
to add custom join paths.
Regarding to the comment from Tom:
| * The API for add_join_path_hook seems overcomplex, as well as too full
| of implementation details that should remain private to joinpath.c.
| I particularly object to passing the mergeclause lists, which seem unlikely
| to be of interest for non-mergejoin plan types anyway.
| More generally, it seems likely that this hook is at the wrong level of
| abstraction; it forces the hook function to concern itself with a lot of
| stuff about join legality and parameterization (which I note your patch3
| code fails to do; but that's not an optional thing).
|
The earlier half was already done. My trouble is the later portion.
The overall jobs of add_join_path_hook are below:
1. construction of parameterized path information; being saved at
param_source_rel and extra_lateral_rels.
2. Try to add mergejoin paths with underlying Sort node
3. Try to add mergejoin/nestloop paths without underlying Sort node
4. Try to add hashjoin paths
It seems to me the check for join legality and parameterization are
built within individual routines for each join algorithm.
(what does the "join legality check" actually mean?)
Probably, it makes sense to provide a common utility function to be
called back from the extension if we can find out a common part for
all the join logics, however, I don't have clear idea to cut off the
common portion. What's jobs can be done independent from the join
algorithm??
I'd like to see ideas around this issue. Of course, I also think it
is still an option to handle it by extension on the initial version.
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Kouhei Kaigai
> Sent: Monday, March 17, 2014 9:29 AM
> To: Kaigai Kouhei(海外 浩平); Tom Lane
> Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert
> Haas; PgHacker; Peter Eisentraut
> Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>
> Hello,
>
> I adjusted the custom-plan interface patch little bit for the cache-only
> scan patch; that is a demonstration module for vacuum-page hook on top of
> the custom-plan interface.
>
> fix_scan_expr() looks to me useful for custom-plan providers that want to
> implement its own relation scan logic, even though they can implement it
> using fix_expr_common() being already exposed.
>
> Also, I removed the hardcoded portion from the nodeCustom.c although, it
> may make sense to provide a few template functions to be called by custom-
> plan providers, that performs usual common jobs like construction of expr-
> context, assignment of result-slot, open relations, and so on.
> I though the idea during implementation of BeginCustomPlan handler.
> (These template functions are not in the attached patch yet.) How about
> your opinion?
>
> The major portion of this patch is not changed from v10.
>
> Thanks,
> --
> NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> <kaigai(at)ak(dot)jp(dot)nec(dot)com>
>
>
> > -----Original Message-----
> > From: pgsql-hackers-owner(at)postgresql(dot)org
> > [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Kouhei Kaigai
> > Sent: Wednesday, March 12, 2014 1:55 PM
> > To: Tom Lane
> > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
> > Robert Haas; PgHacker; Peter Eisentraut
> > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> >
> > Hello,
> >
> > The attached two patches are the revised custom-plan interface and
> > example usage that implements existing MergeJoin on top of this interface.
> >
> > According to the discussion last week, I revised the portion where
> > custom-node is expected to perform a particular kind of task, like
> > scanning a relation, by putting polymorphism with a set of callbacks
> > set by custom-plan provider.
> > So, the core backend can handle this custom-plan node just an
> > abstracted plan-node with no anticipation.
> > Even though the subject of this message says "custom-scan", I'd like
> > to name the interface "custom-plan" instead, because it became fully
> > arbitrary of extension whether it scan on a particular relation.
> >
> > Definition of CustomXXXX data types were simplified:
> >
> > typedef struct CustomPath
> > {
> > Path path;
> > const struct CustomPathMethods *methods;
> > } CustomPath;
> >
> > typedef struct CustomPlan
> > {
> > Plan plan;
> > const struct CustomPlanMethods *methods;
> > } CustomPlan;
> >
> > typedef struct CustomPlanState
> > {
> > PlanState ps;
> > const CustomPlanMethods *methods;
> > } CustomPlanState;
> >
> > Each types have a base class and a set of function pointers that
> > characterize the behavior of this custom-plan node.
> > In usual use-cases, extension is expected to extend these classes to
> > keep their private data fields needed to implement its own
> functionalities.
> >
> > Most of the methods are designed to work as a thin layer towards
> > existing planner / executor functions, so custom-plan provides has to
> > be responsible to implement its method to communicate with core
> > backend as built-in ones doing.
> >
> > Regarding to the topic we discussed last week,
> >
> > * CUSTOM_VAR has gone.
> > The reason why CUSTOM_VAR was needed is, we have to handle EXPLAIN
> > command output (including column names being referenced) even if a
> > custom-plan node replaced a join but has no underlying subplans on
> left/right subtrees.
> > A typical situation like this is a remote-join implementation that I
> > tried to extend postgres_fdw on top of the previous interface.
> > It retrieves a flat result set of the remote join execution, thus has
> > no subplan locally. On the other hand, EXPLAIN tries to find out
> > "actual" Var node from the underlying subplan if a Var node has
> > special varno (INNER/OUTER/INDEX).
> > I put a special method to solve the problem. GetSpecialCustomVar
> > method is called if a certain Var node of custom-plan has a special
> > varno, then custom-plan provider can inform the core backend an
> > expression node to be referenced by this Var node.
> > It allows to solve the column name without recursive walking on the
> > subtrees, so it enables a custom-plan node that replaces a part of
> plan-tree.
> > This method is optional, so available to adopt existing way if
> > custom-plan provider does not do anything special.
> >
> >
> > * Functions to be exposed, from static declaration
> >
> > Right now, static functions are randomly exposed on demand.
> > So, we need more investigation which functions are needed, and which
> > others are not.
> > According to my trial, the part-2 patch that is MergeJoin on top of
> > the custom-plan interface, class of functions that recursively walk on
> > subplan tree have to be exposed. Like, ExplainPreScanNode,
> > create_plan_recurse, set_plan_refs, fix_expr_common or finalize_plan.
> > In case when custom-plan performs like built-in Append node, it keeps
> > a list of sub-plans in its private field, so the core backend cannot
> > know existence of sub-plans, thus its unavailable to make subplan,
> > unavailable to output EXPLAIN and so on.
> > It does not make sense to reworking on the extension side again.
> > Also, createplan.c has many useful functions to construct plan-node,
> > however, most of them are static because all the built-in plan-node
> > are constructed by the routines in this file, we didn't need to expose
> > them to others. I think, functions in createplan.c being called by
> > create_xxxx_plan() functions to construct plan-node should be exposed
> > for extension's convenient.
> >
> >
> > * Definition of add_join_path_hook
> >
> > I didn't have idea to improve the definition and location of this
> > hook, so it is still on the tail of the add_paths_to_joinrel().
> > Its definition was a bit adjusted according to the feedback on the
> > pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
> > from the argument list. Indeed, these are specific to the built-in
> > MergeJoin logic and easy to reproduce.
> >
> >
> > * Hook location of add_scan_path_hook
> >
> > I moved the add_scan_path_hook and set_cheapest() into
> > set_base_rel_pathlists() from various caller locations;
> > set_xxxx_pathlist() functions typically.
> > It enabled to consolidate the location to add custom-path for base
> > relations.
> >
> >
> > * CustomMergeJoin as a proof-of-concept
> >
> > The contrib module in the part-2 portion is, a merge-join
> > implementation on top of custom-plan interface, even though 99% of its
> > implementation is identical with built-in ones.
> > Its purpose is to demonstrate a custom join logic can be implemented
> > using custom-plan interface, even if custom-plan node has underlying
> > sub-plans unlike previous my examples.
> >
> > Thanks,
> > --
> > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> > <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> >
> >
> > > -----Original Message-----
> > > From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> > > Sent: Friday, March 07, 2014 3:09 AM
> > > To: Kaigai Kouhei(海外 浩平)
> > > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
> > > Robert Haas; PgHacker; Peter Eisentraut
> > > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> > >
> > > Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> writes:
> > > > I expected to include simple function pointers for copying and
> > > > text-output as follows:
> > >
> > > > typedef struct {
> > > > Plan plan;
> > > > :
> > > > NodeCopy_function node_copy;
> > > > NodeTextOut_function node_textout;
> > > > } Custom;
> > >
> > > I was thinking more like
> > >
> > > typedef struct CustomPathFuncs {
> > > const char *name; /* used for debugging purposes only */
> > > NodeCopy_function node_copy;
> > > NodeTextOut_function node_textout;
> > > ... etc etc etc ...
> > > } CustomPathFuncs;
> > >
> > > typedef struct CustomPath {
> > > Path path;
> > > const CustomPathFuncs *funcs;
> > > ... maybe a few more fields here, but not too darn many ...
> > > } CustomPath;
> > >
> > > and similarly for CustomPlan.
> > >
> > > The advantage of this way is it's very cheap for (what I expect will
> > > be) the common case where an extension has a fixed set of support
> > > functions for its custom paths and plans. It just declares a static
> > > constant CustomPathFuncs struct, and puts a pointer to that into its
> paths.
> > >
> > > If an extension really needs to set the support functions on a
> > > per-object basis, it can do this:
> > >
> > > typdef struct MyCustomPath {
> > > CustomPath cpath;
> > > CustomPathFuncs funcs;
> > > ... more fields ...
> > > } MyCustomPath;
> > >
> > > and then initialization of a MyCustomPath would include
> > >
> > > mypath->cpath.funcs = &mypath->funcs;
> > > mypath->funcs.node_copy = MyCustomPathCopy;
> > > ... etc etc ...
> > >
> > > In this case we're arguably wasting one pointer worth of space in
> > > the path, but considering the number of function pointers such a
> > > path will be carrying, I don't think that's much of an objection.
> > >
> > > >> So? If you did that, then you wouldn't have renumbered the Vars
> > > >> as INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at
> > > >> all; if it is needed, then there would also be a need for an
> > > >> additional tuple slot in executor contexts, which you haven't
> provided.
> > >
> > > > For example, the enhanced postgres_fdw fetches the result set of
> > > > remote join query, thus a tuple contains the fields come from both
> side.
> > > > In this case, what varno shall be suitable to put?
> > >
> > > Not sure what we'd do for the general case, but CUSTOM_VAR isn't the
> > solution.
> > > Consider for example a join where both tables supply columns named "id"
> > > --- if you put them both in one tupledesc then there's no non-kluge
> > > way to identify them.
> > >
> > > Possibly the route to a solution involves adding another plan-node
> > > callback function that ruleutils.c would use for printing Vars in
> > > custom
> > join nodes.
> > > Or maybe we could let the Vars keep their original RTE numbers,
> > > though that would complicate life at execution time.
> > >
> > > Anyway, if we're going to punt on add_join_path_hook for the time
> > > being, this problem can probably be left to solve later. It won't
> > > arise for simple table-scan cases, nor for single-input plan nodes
> > > such
> > as sorts.
> > >
> > > regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Gurjeet Singh | 2014-03-20 02:29:32 | Re: Patch to send transaction commit/rollback stats to the stats collector unconditionally. |
Previous Message | Pavel Stehule | 2014-03-20 00:46:19 | Re: four minor proposals for 9.5 |