From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Stephen Frost <sfrost(at)snowman(dot)net>, Vik Reykja <vikreykja(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: proposal: fix corner use case of variadic fuctions usage |
Date: | 2013-01-19 18:18:51 |
Message-ID: | CAFj8pRAhO1UX+k+rhf5SKTgcQzF0ihcaWKfdiUqQJcnGrnqOvA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Hello
2013/1/18 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
>> [ quick review of patch ]
>
> On reflection it seems to me that this is probably not a very good
> approach overall. Our general theory for functions taking ANY has
> been that the core system just computes the arguments and leaves it
> to the function to make sense of them. Why should this be different
> in the one specific case of VARIADIC ANY with a variadic array?
>
I am not sure if I understand well to question?
Reason why VARIADIC "any" is different than VARIADIC anyarray is
simple - we have only single type arrays - there are no "any"[] - and
we use combination FunctionCallInfoData structure for data and parser
expression nodes for type specification. And why we use VARIADIC "any"
function? Due our coerce rules - that try to find common coerce type
for any unknown (polymorphic) parameters. This coercion can be
acceptable - and then people can use VARIADIC "anyarray" or
unacceptable - and then people should to use VARIADIC "any" - for
example we would not lost type info for boolean types. Next motivation
for VARIADIC "any" - implementation is very simple and fast - nobody
have to do packing and unpacking array - just push values to narg, arg
and argnull fields of FunctionCallInfoData structure. There are no
necessary any operations with data. There are only one issue - this
corner case.
> The approach is also inherently seriously inefficient. Not only
> does ExecMakeFunctionResult have to build a separate phony Const
> node for each array element, but the variadic function itself can
> have no idea that those Consts are all the same type, which means
> it's going to do datatype lookup over again for each array element.
> (format(), for instance, would have to look up the same type output
> function over and over again.) This might not be noticeable on toy
> examples with a couple of array entries, but it'll be a large
> performance problem on large arrays.
yes, "format()" it actually does it - in all cases. And it is not best
- but almost of overhead is masked by using sys caches.
What is important - for this use case - there is simple and perfect
possible optimization - in this case "non variadic manner call of
variadic "any" function" all variadic parameters will share same type.
Inside variadic function I have not information so this situation is
in this moment, but just I can remember last used type - and I can
reuse it, when parameter type is same like previous parameter.
So there no performance problem.
>
> The patch also breaks any attempt that a variadic function might be
> making to cache info about its argument types across calls. I suppose
> that the copying of the FmgrInfo is a hack to avoid crashes due to
> called functions supposing that their flinfo->fn_extra data is still
> valid for the new argument set. Of course that's another pretty
> significant performance hit, compounding the per-element hit. Whereas
> an ordinary variadic function that is given N arguments in a particular
> query will probably do N datatype lookups and cache the info for the
> life of the query, a variadic function called with this approach will
> do one datatype lookup for each array element in each row of the query;
> and there is no way to optimize that.
>
I believe so cache of last used datatype and related function can be
very effective and enough for this possible performance issues.
> But large arrays have a worse problem: the approach flat out does
> not work for arrays of more than FUNC_MAX_ARGS elements, because
> there's no place to put their values in the FunctionCallInfo struct.
> (As submitted, if the array is too big the patch will blithely stomp
> all over memory with user-supplied data, making it not merely a crash
> risk but probably an exploitable security hole.)
agree - FUNC_MAX_ARGS should be tested - it is significant security
bug and should be fixed.
>
> This last problem is, so far as I can see, unfixable within this
> approach; surely we are not going to accept a feature that only works
> for small arrays. So I am going to mark the CF item rejected not just
> RWF.
>
disagree - non variadic manner call should not be used for walk around
FUNC_MAX_ARGS limit. So there should not be passed big array.
If somebody need to pass big array to function, then he should to use
usual non variadic function with usual array type parameter. He should
not use a VARIADIC parameter. We didn't design variadic function to
exceeding FUNC_MAX_ARGS limit.
So I strongly disagree with rejection for this argument. By contrast -
a fact so we don't check array size when variadic function is not
called as variadic function is bug.
Any function - varidic or not varidic in any use case have to have max
FUNC_MAX_ARGS arguments. Our internal variadic functions - that are
+/- VARIADIC "any" functions has FUNC_MAX_ARGS limit.
> I believe that a workable approach would require having the function
> itself act differently when the VARIADIC flag is set. Currently there
> is no way for ANY-taking functions to do this because we don't record
> the use of the VARIADIC flag in FuncExpr, but it'd be reasonable to
> change that, and probably then add a function similar to
> get_fn_expr_rettype to dig it out of the FmgrInfo. I don't know if
> we could usefully provide any infrastructure beyond that for the case,
> but it'd be worth thinking about whether there's any common behavior
> for such functions.
You propose now something, what you rejected four months ago.
your proposal is +/- same like my second proposal and implementation
that I sent some time ago. I have not a problem with it - and I'll
rewrite it, just I recapitulate your objection
a) anybody should to fix any existent variadic "any" function
separately - this fix is not general
b) it increase complexity (little bit) for this kind of variadic functions.
please, can we define agreement on strategy how to fix this issue?
I see two important questions?
a) what limits are valid for variadic functions? Now FUNC_MAX_ARGS
limit is ignored sometime - is it ok?
b) where array of variadic parameters for variadic "any" function
should be unnested? two possibilities - executor (increase complexity
of executor) or inside variadic function - (increase complexity of
variadic function).
I wrote three variants how to fix this issue - all variants has some
advantage and some disadvantage - and I believe so fourth and fifth
variant will be same - but all advantage and disadvantage are relative
well defined now - so we should to decide for one way.
No offensive Tom :), sincerely thank you for review
Best regards
Pavel
>
> regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2013-01-19 18:56:46 | Re: proposal: fix corner use case of variadic fuctions usage |
Previous Message | Gavin Flower | 2013-01-19 17:50:43 | Re: Combine Date and Time Columns to Timestamp |
From | Date | Subject | |
---|---|---|---|
Next Message | George Machitidze | 2013-01-19 18:45:15 | Re: BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve |
Previous Message | Kevin Grittner | 2013-01-19 17:58:35 | Re: Query to help in debugging |