Re: [v9.5] Custom Plan API

From: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
To: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: [v9.5] Custom Plan API
Date: 2014-07-03 14:00:18
Message-ID: CAEZqfEcq1vuVhOz+bUXgG3Sawx51h0ARyr5aFeu4rz6RAJkiSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kaigai-san,

Sorry for lagged response.

Here are my random thoughts about the patch. I couldn't understand
the patch fully, because some of APIs are not used by ctidscan. If

Custom Scan patch v2 review

* Custom plan class comparison
In backend/optimizer/util/pathnode.c, custclass is compared by bit-and
with 's'. Do you plan to use custclass as bit field? If so, values
for custom plan class should not be a character. Otherwise, custclass
should be compared by == operator.

* Purpose of GetSpecialCustomVar()
The reason why FinalizeCustomPlan callback is necessary is not clear to me.
Could you show a case that the API would be useful?

* Purpose of FinalizeCustomPlan()
The reason why FinalizeCustomPlan callback is necessary is not clear
to me, because ctidscan just calls finalize_primnode() and
bms_add_members() with given information. Could you show a case that
the API would be useful?

* Is it ok to call set_cheapest() for all relkind?
Now set_cheapest() is called not for only relation and foreign table
but also custom plan, and other relations such as subquery, function,
and value. Calling call_custom_scan_provider() and set_cheapest() in
the case of RTE_RELATION seems similar to the old construct, how do
you think about this?

* Is it hard to get rid of CopyCustomPlan()?
Copying ForeignScan node doesn't need per-FDW copy function by
limiting fdw_private to have only copy-able objects. Can't we use the
same way for CustomPlan? Letting authors call NodeSetTag or
copyObject() sounds uncomfortable to me.

This would be able to apply to TextOutCustomPlan() and TextOutCustomPath() too.

* MultiExec support is appropriate for the first version?
The cases need MultiExec seems little complex for the first version of
custom scan. What kind of plan do you image for this feature?

* Does SupportBackwardScan() have enough information?
Other scans check target list with TargetListSupportsBackwardScan().
Isn't it necessary to check it for CustomPlan too in
ExecSupportsBackwardScan()?

* Place to call custom plan provider
Is it necessary to call provider when relkind != RELKIND_RELATION? If
yes, isn't it necessary to call for append relation?

I know that we concentrate to replacing scan in the initial version,
so it would not be a serious problem, but it would be good to consider
extensible design.

* Custom Plan Provider is "addpath"?
Passing addpath handler as only one attribute of CUSTOM PLAN PROVIDER
seems little odd.
Using handler like FDW makes the design too complex and/or messy?

* superclass of CustomPlanState
CustomPlanState derives ScanState, instead of deriving PlanState
directly. I worry the case of non-heap-scan custom plan, but it might
be ok to postpone consideration about that at the first cut.

* Naming and granularity of objects related to custom plan
I'm not sure the current naming is appropriate, especially difference
between "custom plan" and "provider" and "handler". In the context of
CREATE CUSTOM PLAN statement, what the term "custom plan" means? My
impression is that "custom plan" is an alternative plan type, e.g.
ctidscan or pg_strom_scan. Then what the term "provider" means? My
impression for that is extension, such as ctidscan and pg_strom. The
grammar allows users to pass function via PROVIDER clause of CREATE
CUSTOM SCAN, so the function would be the provider of the custom plan
created by the statement.

* enable_customscan
GUC parameter enable_customscan would be useful for users who want to
disable custom plan feature temporarily. In the case of pg_strom,
using GPU for limited sessions for analytic or batch applications
seems handy.

* Adding pg_custom_plan catalog
Using "cust" as prefix for pg_custom_plan causes ambiguousness which
makes it difficult to choose catalog prefix for a feature named
"Custom Foo" in future. How about using "cusp" (CUStom Plan)?

Or is it better to use pg_custom_plan_provider as catalog relation
name, as the document says that "CREATE CUSTOM PLAN defines custom
plan provider". Then prefix could be "cpp" (Custom Plan Provider).
This seems to match the wording used for pg_foreign_data_wrapper.

* CREATE CUSTOM PLAN statement
This is just a question: We need to emit CREATE CUSTOM PLAN if we want to use
I wonder how it is extended when supporting join as custom class.

* New operators about TID comparison
IMO this portion should be a separated patch, because it adds OID
definition of existing operators such as tidgt and tidle. Is there
any (explicit or implicit) rule about defining macro for oid of an
operator?

* Prototype of get_custom_plan_oid()
custname (or cppname if use the rule I proposed above) seems
appropriate as the parameter name of get_custom_plan_oid() because
similar funcitons use catalog column names in such case.

* Coding conventions
Some lines are indented with white space. Future pgindent run will
fix this issue?

* Unnecessary struct forward declaration
Forward declarations of CustomPathMethods, Plan, and CustomPlan in
includes/nodes/relation.h seem unncecessary. Other headers might have
same issue.

* Unnecessary externing of replace_nestloop_params()
replace_nestloop_params() is extern-ed but it's never called outside
createplan.c.

* Externing fix_scan_expr()
If it's necessary for all custom plan providers to call fix_scan_expr
(via fix_scan_list macro), isn't it able to do it in set_plan_refs()
before calling SetCustomPlanRef()?

* What does T_CustomPlanMarkPos mean?
It's not clear to me when CustomPlanMarkPos works. Is it for a custom
plan provider which supports marking position and rewind to the
position, and ctidscan just lacks capability to do that, so it is not
used anywhere?

* Unnecessary changes in allpaths.c
some comment about Subquery and CTE are changed (perhaps) accidentally.

* Typos
* planenr -> planner, implements -> implement in create_custom_plan.sgml
* CustomScan in nodeCustom.h should be CustomPlan?
* delivered -> derived, in src/backend/optimizer/util/pathnode.c

* Document "Writing Custom Plan Provider" is not provided
Custom Plan Provider author would (and I DO!) hope documents about
writing a custom plan provider.

Regards,

2014-06-17 23:12 GMT+09:00 Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>:
> Hanada-san,
>
> Thanks for your checks. I oversight the points when I submit the patch, sorry.
> The attached one is revised one on documentation stuff and contrib/Makefile.
>
> Thanks,
>
> 2014-06-16 17:29 GMT+09:00 Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>:
>> Kaigai-san,
>>
>> I've just applied v1 patch, and tried build and install, but I found two issues:
>>
>> 1) The contrib/ctidscan is not automatically built/installed because
>> it's not described in contrib/Makefile. Is this expected behavior?
>> 2) I got an error message below when building document.
>>
>> $ cd doc/src/sgml
>> $ make
>> openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D .
>> -d stylesheet.dsl -t sgml -i output-html -V html-index postgres.sgml
>> openjade:catalogs.sgml:2525:45:X: reference to non-existent ID
>> "SQL-CREATECUSTOMPLAN"
>> make: *** [HTML.index] Error 1
>> make: *** Deleting file `HTML.index'
>>
>> I'll review another part of the patch, including the design.
>>
>>
>> 2014-06-14 10:59 GMT+09:00 Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>:
>>> According to the discussion upthread, I revised the custom-plan patch
>>> to focus on regular relation scan but no join support right now, and to
>>> support DDL command to define custom-plan providers.
>>>
>>> Planner integration with custom logic to scan a particular relation is
>>> enough simple, unlike various join cases. It's almost similar to what
>>> built-in logic are doing now - custom-plan provider adds a path node
>>> with its cost estimation if it can offer alternative way to scan referenced
>>> relation. (in case of no idea, it does not need to add any paths)
>>>
>>> A new DDL syntax I'd like to propose is below:
>>>
>>> CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>;
>>>
>>> <name> is as literal, put a unique identifier.
>>> <class> is workload type to be offered by this custom-plan provider.
>>> "scan" is the only option right now, that means base relation scan.
>>> <function_name> is also as literal; it shall perform custom-plan provider.
>>>
>>> A custom-plan provider function is assumed to take an argument of
>>> "internal" type to deliver a set of planner information that is needed to
>>> construct custom-plan pathnode.
>>> In case of "scan" class, pointer towards an customScanArg object
>>> shall be delivered on invocation of custom-plan provider.
>>>
>>> typedef struct {
>>> uint32 custom_class;
>>> PlannerInfo *root;
>>> RelOptInfo *baserel;
>>> RangeTblEntry *rte;
>>> } customScanArg;
>>>
>>> In case when the custom-plan provider function being invoked thought
>>> it can offer an alternative scan path on the relation of "baserel", things
>>> to do is (1) construct a CustomPath (or its inherited data type) object
>>> with a table of callback function pointers (2) put its own cost estimation,
>>> and (3) call add_path() to register this path as an alternative one.
>>>
>>> Once the custom-path was chosen by query planner, its CreateCustomPlan
>>> callback is called to populate CustomPlan node based on the pathnode.
>>> It also has a table of callback function pointers to handle various planner's
>>> job in setrefs.c and so on.
>>>
>>> Similarly, its CreateCustomPlanState callback is called to populate
>>> CustomPlanState node based on the plannode. It also has a table of
>>> callback function pointers to handle various executor's job during quey
>>> execution.
>>>
>>> Most of callback designs are not changed from the prior proposition in
>>> v9.4 development cycle, however, here is a few changes.
>>>
>>> * CustomPlan became to inherit Scan, and CustomPlanState became to
>>> inherit ScanState. Because some useful routines to implement scan-
>>> logic, like ExecScan, expects state-node has ScanState as its base
>>> type, it's more kindness for extension side. (I'd like to avoid each
>>> extension reinvent ExecScan by copy & paste!)
>>> I'm not sure whether it should be a union of Join in the future, however,
>>> it is a reasonable choice to have compatible layout with Scan/ScanState
>>> to implement alternative "scan" logic.
>>>
>>> * Exporting static functions - I still don't have a graceful answer here.
>>> However, it is quite natural that extensions to follow up interface updates
>>> on the future version up of PostgreSQL.
>>> Probably, it shall become clear what class of functions shall be
>>> exported and what class of functions shall be re-implemented within
>>> extension side in the later discussion.
>>> Right now, I exported minimum ones that are needed to implement
>>> alternative scan method - contrib/ctidscan module.
>>>
>>> Items to be discussed later:
>>> * planner integration for relations join - probably, we may define new
>>> custom-plan classes as alternative of hash-join, merge-join and
>>> nest-loop. If core can know this custom-plan is alternative of hash-
>>> join, we can utilize core code to check legality of join.
>>> * generic key-value style options in custom-plan definition - Hanada
>>> san proposed me off-list - like foreign data wrapper. It may enable
>>> to configure multiple behavior on a binary.
>>> * ownership and access control of custom-plan. right now, only
>>> superuser can create/drop custom-plan provider definition, thus,
>>> it has no explicit ownership and access control. It seems to me
>>> a reasonable assumption, however, we may have a usecase that
>>> needs custom-plan by unprivileged users.
>>>
>>> Thanks,
>>>
>>> 2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>:
>>>>> On 8 May 2014 22:55, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>>
>>>>> >> We're past the prototyping stage and into productionising what we
>>>>> >> know works, AFAIK. If that point is not clear, then we need to
>>>>> >> discuss that first.
>>>>> >
>>>>> > OK, I'll bite: what here do we know works? Not a damn thing AFAICS;
>>>>> > it's all speculation that certain hooks might be useful, and
>>>>> > speculation that's not supported by a lot of evidence. If you think
>>>>> > this isn't prototyping, I wonder what you think *is* prototyping.
>>>>>
>>>>> My research contacts advise me of this recent work
>>>>> http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf
>>>>> and also that they expect a prototype to be ready by October, which I have
>>>>> been told will be open source.
>>>>>
>>>>> So there are at least two groups looking at this as a serious option for
>>>>> Postgres (not including the above paper's authors).
>>>>>
>>>>> That isn't *now*, but it is at least a time scale that fits with acting
>>>>> on this in the next release, if we can separate out the various ideas and
>>>>> agree we wish to proceed.
>>>>>
>>>>> I'll submerge again...
>>>>>
>>>> Through the discussion last week, our minimum consensus are:
>>>> 1. Deregulated enhancement of FDW is not a way to go
>>>> 2. Custom-path that can replace built-in scan makes sense as a first step
>>>> towards the future enhancement. Its planner integration is enough simple
>>>> to do.
>>>> 3. Custom-path that can replace built-in join takes investigation how to
>>>> integrate existing planner structure, to avoid (3a) reinvention of
>>>> whole of join handling in extension side, and (3b) unnecessary extension
>>>> calls towards the case obviously unsupported.
>>>>
>>>> So, I'd like to start the (2) portion towards the upcoming 1st commit-fest
>>>> on the v9.5 development cycle. Also, we will be able to have discussion
>>>> for the (3) portion concurrently, probably, towards 2nd commit-fest.
>>>>
>>>> Unfortunately, I cannot participate PGcon/Ottawa this year. Please share
>>>> us the face-to-face discussion here.
>>>>
>>>> Thanks,
>>>> --
>>>> NEC OSS Promotion Center / PG-Strom Project
>>>> KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
>>>>
>>> --
>>> KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
>>
>>
>>
>> --
>> Shigeru HANADA
>
>
>
> --
> KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

--
Shigeru HANADA

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2014-07-03 14:11:54 Re: Proposing pg_hibernate
Previous Message Atri Sharma 2014-07-03 13:42:15 Re: bad estimation together with large work_mem generates terrible slow hash joins