From: | Jim Nasby <jim(dot)nasby(at)openscg(dot)com> |
---|---|
To: | Craig Ringer <craig(at)2ndquadrant(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com> |
Subject: | Re: Faster methods for getting SPI results (460% improvement) |
Date: | 2017-04-06 02:44:32 |
Message-ID: | a96b9506-5587-7290-9eed-8830ce7f318e@openscg.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 4/4/17 7:44 PM, Craig Ringer wrote:
> On 5 April 2017 at 08:23, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>> On 5 April 2017 at 08:00, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> This patch fails to update the documentation at all.
>
> https://www.postgresql.org/docs/devel/static/spi.html
I'll fix that soon.
> missing newline
Fixed.
> +/* Execute a previously prepared plan with a callback Destination */
>
>
> caps "Destination"
Hmm, I capitalized it since DestReceiver is capitalized. I guess it's
best to just drop Destination.
> + // XXX throw error if callback is set
Fixed (opted to just use an Assert).
> +static DestReceiver spi_callbackDR = {
> + donothingReceive, donothingStartup, donothingCleanup, donothingCleanup,
> + DestSPICallback
> +};
> Presumably that's a default destination you're then supposed to modify
> with your own callbacks? There aren't any comments to indicate what's
> going on here.
Correct. Added the following comment:
> /*
> * This is strictly a starting point for creating a callback. It should not
> * actually be used.
> */
> Comments on patch 2:
>
> If this is the "bare minimum" patch, what is pending? Does it impose
> any downsides or limits?
No limits. I'm not aware of any downsides.
It's "bare minimum" because a follow-on step is to allow different
methods of returning results. In particular, my testing indicates that
patch 1 + returning a dict of lists (as opposed to the current list of
dicts) results in a 460% improvement vs ~30% with patch 2.
> +/* Get switch execution contexts */
> +extern PLyExecutionContext
> *PLy_switch_execution_context(PLyExecutionContext *new);
>
> Comment makes no sense to me. This seems something like memory context
> switch, where you supply the new and return the old. But the comment
> doesn't explain it at all.
Comment changed to
/* Switch execution context (similar to MemoryContextSwitchTo */
> +void PLy_CSStartup(DestReceiver *self, int operation, TupleDesc typeinfo);
> +void PLy_CSDestroy(DestReceiver *self);
>
> These are declared in the plpy_spi.c. Why aren't these static?
Derp. Fixed.
> + volatile MemoryContext oldcontext;
> + volatile ResourceOwner oldowner;
> int rv;
> - volatile MemoryContext oldcontext;
> - volatile ResourceOwner oldowner;
>
> Unnecessary code movement.
IMHO it reads better that way. I've left it for now so $COMMITTER can
decide, but if it really bugs you let me know and I'll swap it.
> In PLy_Callback_New, I think your use of a sub-context is sensible. Is
> it necessary to palloc0 though?
Hrm, maybe not... but it seems like cheap insurance considering the
amount of other stuff involved in just starting a new SPI call. And
honestly, I'd rather not mess with it at this point. :) I have added an
XXX comment though.
> +static CallbackState *
> +PLy_Callback_Free(CallbackState *callback)
>
> The code here looks like it could be part of spi.c's callback support,
> rather than plpy_spi specific, since you provide a destroy callback in
> the SPI callback struct.
I added this for use in PG_CATCH() blocks, because it's not clear to me
that the portal gets cleaned up in those cases. It's certainly possible
that it's pointless.
FWIW, I'm pretty sure I copied that pattern from somewhere else...
probably plpgsql or pltcl.
> + /* We need to store this because the TupleDesc the receive
> function gets has no names. */
> + myState->desc = typeinfo;
>
> Is it safe to just store the pointer to the TupleDesc here? What's the lifetime?
Only Portal lifetime.
> + * will clean it up when the time is right. XXX This might result in a leak
> + * if an error happens and the result doesn't get dereferenced.
> + */
> + MemoryContextSwitchTo(TopMemoryContext);
> + result->tupdesc = CreateTupleDescCopy(typeinfo);
>
> especially given this XXX comment...
I've changed the comment to the following. Hopefully this clarifies things:
> /*
> * Save tuple descriptor for later use by result set metadata
> * functions. Save it in TopMemoryContext so that it survives outside of
> * an SPI context. We trust that PLy_result_dealloc() will clean it up
> * when the time is right. The difference between result and everything
> * else is that result needs to survive after the portal is destroyed,
> * because result is what's handed back to the plpython function. While
> * it's tempting to use something other than TopMemoryContext, that won't
> * work: the user could potentially put result into the global dictionary,
> * which means it could survive as long as the session does. This might
> * result in a leak if an error happens and the result doesn't get
> * dereferenced, but if that happens it means the python GC has failed us,
> * at which point we probably have bigger problems.
> *
> * This still isn't perfect though; if something the result tupledesc
> * references has it's OID changed then the tupledesc will be invalid. I'm
> * not sure it's worth worrying about that though.
> */
Updated patches attached, but I still need to update the docs.
--
Jim Nasby, Chief Data Architect, Austin TX
OpenSCG http://OpenSCG.com
Attachment | Content-Type | Size |
---|---|---|
0001-Add-SPI_execute_callback.patch | text/plain | 8.7 KB |
0002-Switch-plpython-to-using-SPI_execute_callback.patch | text/plain | 14.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2017-04-06 03:46:07 | Re: Faster methods for getting SPI results (460% improvement) |
Previous Message | Masahiko Sawada | 2017-04-06 02:33:22 | Interval for launching the table sync worker |