Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Daniel Farina <drfarina(at)gmail(dot)com>
Cc: Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 13:39:21
Message-ID: 162867790911240539v3aa7e091g583aa6e77e6dcfe7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2009/11/24 Daniel Farina <drfarina(at)gmail(dot)com>:
> On Tue, Nov 24, 2009 at 4:37 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> 2009/11/24 Daniel Farina <drfarina(at)gmail(dot)com>:
>>> On Tue, Nov 24, 2009 at 2:10 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>>> Hello
>>>>
>>>> I thing, so this patch is maybe good idea. I am missing better
>>>> function specification. Specification by name isn't enough - we can
>>>> have a overloaded functions. This syntax doesn't allow to use explicit
>>>> cast - from my personal view, the syntax is ugly - with type
>>>> specification we don't need to keyword FUNCTION
>>>
>>> As long as things continue to support the INTERNAL-type behavior for
>>> extremely low overhead bulk transfers I am open to suggestions about
>>> how to enrich things...but how would I do so under this proposal?
>>>
>>
>> using an INTERNAL type is wrong. It breaks design these functions for
>> usual PL. I don't see any reason, why it's necessary.
>>
>>> I am especially fishing for suggestions in the direction of managing
>>> state for the function between rows though...I don't like how the
>>> current design seems to scream "use a global variable."
>>>
>>>> We have a fast copy statement - ok., we have a fast function ok, but
>>>> inside a function we have to call "slow" sql query. Personally What is
>>>> advantage?
>>>
>>> The implementation here uses a type 'internal' for performance.  It
>>> doesn't even recompute the fcinfo because of the very particular
>>> circumstances of how the function is called.  It doesn't do a memory
>>> copy of the argument buffer either, to the best of my knowledge.  In
>>> the dblink patches you basically stream directly from the disk, format
>>> the COPY bytes, and shove it into a waiting COPY on another postgres
>>> node...there's almost no additional work in-between.  All utilized
>>> time would be some combination of the normal COPY byte stream
>>> generation and libpq.
>>>
>>
>> I understand and I dislike it. This design isn't general - or it is
>> far from using a function. It doesn't use complete FUNCAPI interface.
>> I thing so you need different semantic. You are not use a function.
>> You are use some like "stream object". This stream object can have a
>> input, output function, and parameters should be internal (I don't
>> thing, so internal could to carry any significant performance here) or
>> standard. Syntax should be similar to CREATE AGGREGATE.
>
> I think you might be right about this.  At the time I was too shy to
> add a DDL command for this hack, though.  But what I did want is a
> form of currying, and that's not easily accomplished in SQL without
> extension...
>

COPY is a PostgreSQL extension. If there are other related extensions - why not?
PostgreSQL has lot of database objects over SQL standard - see
fulltext implementation. I am not sure if STREAM is good keyword now.
It could be in collision with STREAM from streaming databases.

>> then syntax should be:
>>
>> COPY table TO streamname(parameters)
>>
>> COPY table TO filestream('/tmp/foo.dta') ...
>> COPY table TO dblinkstream(connectionstring) ...
>
> I like this one quite a bit...it's a bit like an aggregate, except the
> initial condition can be set in a rather function-callish way.
>
> But that does seem to require making a DDL command, which leaves a
> nice green field.  In particular, we could then make as many hooks,
> flags, and options as we wanted, but sometimes there is a paradox of
> choice...I just did not want to anticipate on Postgres being friendly
> to a new DDL command when writing this the first time.
>

sure - nobody like too much changes in gram.y. But well designed
general feature with related SQL enhancing is more acceptable, then
fast simply hack. Don't be a hurry. This idea is good - but it needs:

a) good designed C API like:

initialise_functions(fcinfo) -- std fcinfo
consument_process_tuple(fcinfo) -- gets standard row -- Datum
dvalues[] + Row description
producent_process_tuple(fcinfo) -- returns standard row -- Datum
dvalues[] + Row description (look on SRF API)
terminate_funnction(fcinfo)

I am sure, so this could be similar to AGGREGATE api
+ some samples to contrib

b) good designed PLPerlu and PLPythonu interface
+ some samples to documentation

Regards
Pavel Stehule

>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2009-11-24 14:35:29 Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Previous Message Hannu Krosing 2009-11-24 13:20:57 Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION