Re: delta relations in AFTER triggers

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>, Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: delta relations in AFTER triggers
Date: 2014-09-01 17:07:42
Message-ID: 5404A7DE.5040204@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/30/2014 12:15 AM, Kevin Grittner wrote:
> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
>> On 08/28/2014 12:03 AM, Kevin Grittner wrote:
>>> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
>>>> I suggest adding a new hook to the ParseState struct, (p_rangevar_hook
>>>> ?). The planner calls it whenever it sees a reference to a table, and
>>>> the hook function returns back some sort of placeholder reference to the
>>>> tuplestore. With variables, the hook returns a Param node, and at
>>>> execution time, the executor calls the paramFetch hook to fetch the
>>>> value of the param. For relations/tuplestores, I guess we'll need to
>>>> invent something like a Param node, but for holding information about
>>>> the relation. Like your TsrData struct, but without the pointer to the
>>>> tuplestore. At execution time, in the SPI_execute call, you pass the
>>>> pointer to the tuplestore in the ParamListInfo struct, like you pass
>>>> parameter values.
>>>>
>>>> Does this make sense?
>>>
>>> I see your point, but SPI first has to be made aware of the
>>> tuplestores and their corresponding names and TupleDesc structures.
>>> Does it make sense to keep the SPI_register_tuplestore() and
>>> SPI_unregister_tuplestore() functions for the client side of the
>>> API, and pass things along to the parse analysis through execution
>>> phases using the techniques you describe?
>>
>> Sorry, I didn't understand that. What do you mean by "first", and the
>> "client side of the API"? I don't see any need for the
>> SPI_register_tuplestore() and and SPI_unregister_tuplestore() functions
>> if you use the hooks.
>
> If we were to go with the hooks as you propose, we would still need
> to take the information from TriggerData and put it somewhere else
> for the hook to reference.

Sure.

> The hooks are generalized for plpgsql,
> not just for triggers, and it doesn't seem appropriate for them to
> be fishing around in the TriggerData structure.

PLpgSQL_execstate seems like the appropriate place.

> And what if we add other sources for tuplestores?

What about it?

> The lookup during parse analysis
> each time an apparent relation name is encountered must be simple
> and fast.

We already use hooks for ColumnRefs, which are called even more often,
and we haven't had a problem making that fast enough.

> I want named tuplestores to be easy to use from *all* PLs (for
> trigger usage) as well as useful for other purposes people may want
> to develop.

I'm not sure other PLs would even want to resolve the old/new relations
like PL/pgSQL does. It might be more natural to access the new/old
tuplestores as perl or python hashes or arrays, for example. But if they
do, it's not that difficult to write the hooks.

> I had to change the hashkey for plpgsql's plan
> caching, but that needs to be done regardless of the API (to
> prevent problems in the obscure case that someone attaches the same
> trigger function to the same table for the same events more than
> once with different trigger names and different transition table
> names). If you ignore that, the *entire* change to use this in
> plpgsql is to add these lines to plpgsql_exec_trigger():
>
> /*
> * Capture the NEW and OLD transition TABLE tuplestores (if specified for
> * this trigger).
> */
> if (trigdata->tg_newtable)
> {
> Tsr tsr = palloc(sizeof(TsrData));
>
> tsr->name = trigdata->tg_trigger->tgnewtable;
> tsr->tstate = trigdata->tg_newtable;
> tsr->tupdesc = trigdata->tg_relation->rd_att;
> tsr->relid = trigdata->tg_relation->rd_id;
> SPI_register_tuplestore(tsr);
> }
> if (trigdata->tg_oldtable)
> {
> Tsr tsr = palloc(sizeof(TsrData));
>
> tsr->name = trigdata->tg_trigger->tgoldtable;
> tsr->tstate = trigdata->tg_oldtable;
> tsr->tupdesc = trigdata->tg_relation->rd_att;
> tsr->relid = trigdata->tg_relation->rd_id;
> SPI_register_tuplestore(tsr);
> }
>
> With the new SPI functions, the code to implement this in each
> other PL should be about the same (possibly identical), and areas
> using SPI only need similar code to make tuplestores visible to the
> planner and usable in the executor if someone has another use for
> this. You just do the above once you have run SPI_connect() and
> before preparing or executing any query that references the named
> tuplestore.

With hooks, the code to implement them in other PLs would be about the
same too, if they want the same behavior.

> It remains available on that SPI connection until
> SPI_finish() is called or you explicitly unregister it (by name).

Yeah, I don't like that. The SPI interface is currently stateless. Well,
except for cursors and plans explicitly saved with SPI_keepplan. But the
way queries are parsed is stateless - you pass all the necessary
information as part of the SPI_execute call (or similar), using direct
arguments and the ParamListInfo struct.

If you don't want to use hooks, I nevertheless feel that the old/new
relations should be passed as part of the ParamListInfo struct, one way
or another. With hooks, you would set the parserSetup hook, which in
turn would set up the table-ref hook similar to the column-ref hooks,
but if you don't want to do that, you could also add new fields directly
to ParamListInfo for the relations, like the "ParamExternData
params[1]" array that's there currently.

> If we use the hooks, I think it will be several times as much code,
> more invasive, and probably more fragile. More importantly, these
> hooks are not used by the other PLs included with core, and are not
> used in any of the other core code, anywhere. They all use SPI, so
> they can do the above very easily, but adding infrastructure for
> them to use the hooks would be a lot more work, and I'm not seeing
> a corresponding benefit.

If they use SPI, they can use the hooks just as well.

> I think there is some room to change the API as used by the parser,
> planner, and executor so that no changes are needed there if we add
> some other registration mechanism besides SPI, but I think having a
> registry for tuplestores that sort of takes the place of the
> catalogs and related caches (but is far simpler, being process
> local) is a better model than what you propose.
>
> In summary, thinking of the definition of a named tuplestore as a
> variable is the wrong parallel; it is more like a lightweight
> relation, and the comparison should be to that, not to function
> parameters or local variables.

I'm not too convinced. Marti Raudsepp mentioned refcursor variables, and
I think he's on to something. Refcursor are a bit difficult to
understand, so I wouldn't use those directly, but it would make a lot of
sense if you could e.g. loop over the rows directly with a FOR loop,
e.g. "FOR recordvar IN new LOOP ... END LOOP" without having to do
"SELECT * FROM new". Introducing a new "relation variable" datatype for
this would be nice. I won't insist that you have to implement that right
now, but let's not foreclose the option to add it later.

BTW, do we expect the old/new relations to be visible to dynamic SQL,
ie. EXECUTE? I think most users would say yes, even though the old/new
variables are not - you have to pass them with EXECUTE USING. Admittedly
that supports thinking of them more as lightweight relations rather than
variables.

Another question is whether you would expect the NEW/OLD table to be
visible to functions that you call from the trigger? For example, if the
trigger does:

...
PERFORM myfunction()
...

Can myfunction do "SELECT * FROM new" ? If not, is there a work-around?
I guess that's not this patch's problem, because we don't have a way to
pass sets as arguments in general. Refcursor is the closest thing, but
it's cumbersome.

In any case, I think having the parser call back into SPI, in
parse_tuplestore.c, is a modularity violation. The tuplestores need to
somehow find their way into ParseState. If you go with the
SPI_register_tuplestore API, then you should still have the tuplestores
in the ParseState struct, and have SPI fill in that information. Now
that I look back, I think you also referred to that earlier in the
paragraph that I didn't understand. Let me try again:

> I see your point, but SPI first has to be made aware of the
> tuplestores and their corresponding names and TupleDesc structures.
> Does it make sense to keep the SPI_register_tuplestore() and
> SPI_unregister_tuplestore() functions for the client side of the
> API, and pass things along to the parse analysis through execution
> phases using the techniques you describe? That would eliminate the
> need for the SPI_get_caller_tuplestore() function and the
> parse_tuplestore.[ch] files, and change how the data is fetched in
> parse analysis and execution phases, but that seems fairly minimal
> -- there are exactly three places that would need to call the new
> hooks where the patch is now getting the information from the
> registry.

Yes, if you decide to keep the SPI_register_tuplestore() function, then
IMHO you should still use hooks to pass that information to the parser,
planner and executor.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2014-09-01 17:09:18 Re: On partitioning
Previous Message Dobes Vandermeer 2014-09-01 16:50:37 Re: Tips/advice for implementing integrated RESTful HTTP API