Re: Custom Scans and private data

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Custom Scans and private data
Date: 2015-08-27 23:53:51
Message-ID: 20150827235351.GA4147@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2015-08-27 18:59:13 -0400, Tom Lane wrote:
> But I do think that letting custom-scan extensions fool about with
> the semantics of plan-copying (and plan-serialization for that matter)
> is a risky choice that is not justified by some marginal gains in code
> readability

To me the likelihood of having bugs in manual (de)serilization via lists
et al. is higher than introducing bugs via copyfuncs. Or even
out/readfuncs, especially as we could easily add a bunch of
verifications to the latter by verifying field names in READ_*. Wether a
prepared statement fails because a copyfuncs hook failed, or whether
it's in the fdw itself doesn't seem to make that much of a difference.

> especially when there are other feasible ways to attack the
> readability problem.

Lets at least add a Value variant that can store strings that aren't
null terminated, that'd already make things a lot better. Right now you
need to reuse Const nodes and stuff a bytea in there or such.

But even then, that only works if you have a single flat datastructure
that doesn't have any pointers in it. copyObject() style copying can
easily cope with that, using a binary blob really can't.

As far as I can see you right now basically need to write code for
custom scans so that between planning and execution you unconditionally
do:
Planning:
1) create your internal data
2) serialize data,
Execution:
3) unserialize data
4) use unserialized data

There's no way to stash data between 2 and 3 anywhere, even if there's
no need for the whole copying rigamarole because it was just a plain
statement.

If you look at the various *Scan, *Join etc. nodes we have in core, most
of them have a bunch of variable length data structures and pointers in
them. All that you can't sanely do for a custom scan node.

> copyObject needs to be a pretty self-contained operation with not a
> lot of external dependencies

Why?

> > [ ruminations about how to improve the system's extensibility ]
>
> Yeah, I've entertained similar thoughts in the past. It's not very clear
> how we get there from here, though --- for instance, any fundamental
> change in the Node system would break so much code that it's unlikely
> anyone would thank us for it, even assuming we ever finished the
> conversion project.

How about adding a single 'ExtensibleNode' node type (inheriting from
Node as normal). Many of the switches over node types would gain one
additional entry for that, and do something like
case T_ExtensibleNode:
handler = LookupHandler((ExtensibleNode *) node)->extended_type, Handler_ExecInitNode);
if (handler)
{
result = ((ExecInitNodeHandler *) handler)(node, estate, eflags);
break;
}
/* fall through to error */
default:
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));

that should be doable one-by-one in many of these bigger
switches. There's obviously some limits where that's doable, but I don't
think it'd break much code?

How exactly to identify extended_type in a way that makes sense
e.g. between restarting pg, backends, primary/standby isn't trivial, but
I do think it should be doable.

> I'm also less than convinced that "I should be able to install major new
> features without touching the core code" is actually a sane or useful goal
> to have.

Yea, I have some doubts there as well. There's a lot of features though
which we really don't want in core, where some of the limitations of the
node system really do become problematic.

> As a concrete example, you mentioned the lack of extensibility of the
> bison parser, which is undoubtedly a blocking issue if you want a new
> feature that would require new SQL syntax.

While that's a problem, where I really don't have any good answer, I
also think in many cases it's primarily DDL, and primarily extending
existing DDL. By far the most things I've seen want to add informations
to tables and/or columns - and storage options actually kinda give you
that on the grammar level. We just don't allow you too sanely hook into
it, and there's a bunch of other pointless restrictions.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-08-28 00:01:12 Re: PostgreSQL for VAX on NetBSD/OpenBSD
Previous Message Tom Lane 2015-08-27 23:28:25 Re: PostgreSQL for VAX on NetBSD/OpenBSD