From: | Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> |
---|---|
To: | Craig Ringer <craig(at)2ndquadrant(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com> |
Subject: | Re: Faster methods for getting SPI results |
Date: | 2017-01-06 03:50:52 |
Message-ID: | a0df5cc1-d613-0039-5426-3fb115e9c5f1@BlueTreble.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/28/16 3:14 AM, Craig Ringer wrote:
> On 28 December 2016 at 12:32, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
>> On 12/27/16 9:10 PM, Craig Ringer wrote:
>>>
>>> On 28 December 2016 at 09:58, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
>>>
>>>> I've looked at this some more, and ITSM that the only way to do this
>>>> without
>>>> some major surgery is to create a new type of Destination specifically
>>>> for
>>>> SPI that allows for the execution of an arbitrary C function for each
>>>> tuple
>>>> to be sent.
>>>
>>>
>>> That sounds a lot more sensible than the prior proposals. Callback driven.
>>
>>
>> Are there other places this would be useful? I'm reluctant to write all of
>> this just to discover it doesn't help performance at all, but if it's useful
>> on it's own I can just submit it as a stand-alone patch.
>
> I don't have a use for it personally. In BDR and pglogical anything
> that does work with nontrivial numbers of tuples uses lower level
> scans anyway.
>
> I expect anything that uses the SPI to run arbitrary user queries
> could have a use for something like this though. Any PL, for one.
Just a quick update on this: I've gotten this working well enough in
plpython to do some performance testing. This patch does change python
results from being a list of dicts to a dict of lists, but I suspect the
vast majority of the speed improvement is from not creating a tuplestore.
The attached sample (from OS X /usr/bin/sample) is interesting. The
highlight is:
> ! 3398 SPI_execute_callback (in postgres) + 163 [0x110125793]
> ! 3394 _SPI_execute_plan (in postgres) + 1262 [0x1101253fe]
> ! : 2043 standard_ExecutorRun (in postgres) + 288 [0x1100f9a40]
> ! : | 1990 ExecProcNode (in postgres) + 250 [0x1100fd62a]
The top line is the entry into SPI from plpython. The bottom line is
generate_series into a tuplestore and then reading from that tuplestore.
Almost all the time being spent in standard_ExecutorRun is in
PLy_CSreceive, which is appending values to a set of python lists as
it's getting tuples.
The other noteworthy item in the sample is this:
> 535 list_dealloc (in Python) + 116,103,... [0x11982b1b4,0x11982b1a7,...]
that's how long it's taking python to free the 3 lists (each with
9999999 python int objects).
In short (and at best*), this makes plpython just as fast at processing
results as SELECT count(SELECT s, s, s FROM generate_series()).
The * on that is there's something odd going on where plpython starts
out really fast at this, then gets 100% slower. I've reached out to some
python folks about that. Even so, the overall results from a quick test
on my laptop are (IMHO) impressive:
Old Code New Code Improvement
Pure SQL 2 sec 2 sec
plpython 12.7-14 sec 4-10 sec ~1.3-3x
plpython - SQL 10.7-12 sec 2-8 sec ~1.3-6x
Pure SQL is how long an equivalent query takes to run with just SQL.
plpython - SQL is simply the raw python times minus the pure SQL time.
I suspect other PL languages that have fairly fast object alloc and
dealloc would see a similar benefit.
BTW, the patch currently breaks on nested calls to plpython, but I don't
think that should change the performance much.
The test function:
> CREATE OR REPLACE FUNCTION test_series(
> iter int
> ) RETURNS int LANGUAGE plpythonu AS $body$
> d = plpy.execute('SELECT s AS some_table_id, s AS some_field_name, s AS some_other_field_name FROM generate_series(1,{}) s'.format(iter) )
> return len(d['some_table_id'])
> $body$;
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
Attachment | Content-Type | Size |
---|---|---|
plpython_callbeck_v2.diff | text/plain | 16.0 KB |
sample.txt | text/plain | 37.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2017-01-06 04:07:39 | Re: Logical decoding on standby |
Previous Message | Andrew Gierth | 2017-01-06 03:48:58 | Hash support for grouping sets |