Re: [v9.5] Custom Plan API

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: [v9.5] Custom Plan API
Date: 2014-11-22 18:42:45
Message-ID: 12415.1416681765@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> writes:
> Let me explain the current idea of mine.
> CustomScan node will have a field that hold varnode mapping information
> that is constructed by custom-scan provider on create_customscan_plan,
> if they want. It is probably a list of varnode.
> If exists, setrefs.c changes its behavior; that updates varno/varattno of
> varnode according to this mapping, as if set_join_references() does
> based on indexed_tlist.
> To reference exct_scantuple, INDEX_VAR will be a best choice for varno
> of these varnodes, and index of the above varnode mapping list will
> be varattno. It can be utilized to make EXPLAIN output, instead of
> GetSpecialCustomVar hook.

> So, steps to go may be:
> (1) Add custom_private, custom_exprs, ... instead of self defined data
> type based on CustomXXX.
> (2) Rid of SetCustomScanRef and GetSpecialCustomVar hook for the current
> custom-"scan" support.
> (3) Integration of above varnode mapping feature within upcoming join
> replacement by custom-scan support.

Well ... I still do not find this interesting, because I don't believe
that CustomScan is a solution to anything interesting. It's difficult
enough to solve problems like expensive-function pushdown within the
core code; why would we tie one hand behind our backs by insisting that
they should be solved by extensions? And as I mentioned before, we do
need solutions to these problems in the core, regardless of CustomScan.

I think that a useful way to go at this might be to think first about
how to make use of expensive functions that have been cached in indexes,
and then see how the solution to that might translate to pushing down
expensive functions into FDWs and CustomScans. If you start with the
CustomScan aspect of it then you immediately find yourself trying to
design APIs to divide up the solution, which is premature when you
don't even know what the solution is.

The rough idea I'd had about this is that while canvassing a relation's
indexes (in get_relation_info), we could create a list of precomputed
expressions that are available from indexes, then run through the
query tree and replace any matching subexpressions with some Var-like
nodes (or maybe better PlaceHolderVar-like nodes) that indicate that
"we can get this expression for free if we read the right index".
If we do read the right index, such an expression reduces to a Var in
the finished plan tree; if not, it reverts to the original expression.
(Some thought would need to be given to the semantics when the index's
table is underneath an outer join --- that may just mean that we can't
necessarily replace every textually-matching subexpression, only those
that are not above an outer join.) One question mark here is how to do
the "replace any matching subexpressions" bit without O(lots) processing
cost in big queries. But that's probably just a SMOP. The bigger issue
I fear is that the planner is not currently structured to think that
evaluation cost of expressions in the SELECT list has anything to do
with which Path it should pick. That is tied to the handwaving I've
been doing for awhile now about converting all the upper-level planning
logic into generate-and-compare-Paths style; we certainly cannot ignore
tlist eval costs while making those decisions. So at least for those
upper-level Paths, we'd have to have a notion of what tlist we expect
that plan level to compute, and charge appropriate evaluation costs.

So there's a lot of work there and I don't find that CustomScan looks
like a solution to any of it. CustomScan and FDWs could benefit from
this work, in that we'd now have a way to deal with the concept that
expensive functions (and aggregates, I hope) might be computed at
the bottom scan level. But it's folly to suppose that we can make it
work just by hacking some arms-length extension code without any
fundamental planner changes.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-11-22 21:16:03 Re: postgres_fdw behaves oddly
Previous Message Pavel Stehule 2014-11-22 17:54:08 Re: some ideas from users