transforms [was Re: FmgrInfo allocation patterns (and PL handling as staged programming)]

From: Chapman Flack <jcflack(at)acm(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: transforms [was Re: FmgrInfo allocation patterns (and PL handling as staged programming)]
Date: 2025-04-16 02:10:59
Message-ID: 67FF11B3.10200@acm.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/15/25 10:52, Tom Lane wrote:
> The problem from a PL's standpoint is "given this input or output
> of type FOO, should I transform it, and if so using what?". So
> the starting point has to be a type not a transform. ... protrftypes data
> is used as a filter before attempting a pg_transform lookup. If the pg_proc
> entry contained transform OIDs we'd have to filter after the lookup,

I don't think I follow. If a function is declared in LANGUAGE BAR with
TRANSFORM FOR TYPE FOO, then either:

1. There is no transform declared for (trftype foo, trflang bar).
CREATE FUNCTION fails, 42704: transform for type foo language "bar"
does not exist.

2. CREATE FUNCTION succeeds and there is a transform
(trftype foo, trflang bar, trffromsql f1, trftosql f2).

So the choice of what to put in pg_proc is between the oid of type foo,
or the oid of the transform (foo, bar, f1, f2).

If the function is going to encounter type foo at all, it cannot avoid
looking up the transform, either by the transform oid or by (type, lang).

The only case in which that lookup could be avoided would be where
a function declared with TRANSFORM FOR TYPE FOO actually sees no input
or output arguments or internally-SQL-encountered values of type foo in
a given call or at a given call site.

That seems to me like an edge case, so I really am questioning this:

> which is pretty inefficient, especially if you expect that the normal
> case is that there's not an applicable transform.

I *don't* expect that. I expect that if someone took the time to declare
the function with TRANSFORM FOR TYPE FOO and CreateFunction confirmed
that transform existed, the function is expected to encounter values
of type foo and apply that intended transform to them.

Perhaps the efficiency argument is really "say a function has
a list of 100 arguments and only one is of type foo, how many cycles
are wasted in get_transform_tosql and get_transform_fromsql applied
to all those other types?"

At present, they return quickly when the passed typid isn't found
in the passed list of trftypes.

If pg_proc had protransforms instead, that would add a step zero: looking
up the declared transforms to make an in-memory list of (typid, tosqloid,
fromsqloid). After that, get_transform_{tosql,fromsql} would be applied
and return quickly when the passed typid isn't in that list. When it is
in the list, they'd return just as quickly the transform function oid
directly from the list. For a routine with no transforms declared,
step zero completes in the time it takes to see protransforms empty and
return an empty list.

Now the question becomes: how many cycles does step zero spend in excess
of those that must be spent for the function to have its intended behavior?

My answer would be "zero, except in the vanishingly perverse case of
a function declared with transforms for types it never sees."

Am I mistaken?

On 04/15/25 01:05, Pavel Stehule wrote:
> There was a long discussion, and I think the main reason for this design
> is a possibility to use an old code without change although the
> transformations are installed. And secondly there was a possibility to use
> a transformation when it was installed, and use it although it was
> installed after the function was created. ... I can write code
> in PL/Python that can work with or without transformation
> ...
>
https://www.postgresql.org/message-id/flat/1339713732.11971.79.camel%40vanquo.pezone.net

Thank you for the link to that old thread. I can see now that the first
version of the patch in mid-2012 had CREATE and DROP TRANSFORM but did
not add CREATE FUNCTION syntax for which transforms to use; in that
version, it did entail the idea of transforms being selected for use
just by existing. But the problem of that changing the behavior of existing
functions was already recognized by the fifth message in the thread, where
Peter aptly used the words "worst nightmare".

By November of 2013 there were already suggestions about explicit
CREATE FUNCTION syntax for what transforms to apply, And by January 2014
Peter had found the ISO SQL <transform group specification> that does
exactly that, and by April 2015 the patch was adopted in that form.

I'll guess that the 'bogus' dependency creation, based only on argument
and return types, that Tom fixed last week in b73e6d7, was a vestige from
the earliest "apply whatever transforms exist" version of the patch that
didn't get changed when the explicit function-creation syntax was added.

At any rate, we are now firmly in the world of "apply exactly the
transforms explicitly requested at function creation", and that mirrors
ISO SQL and seems the sanest world to me.

While I don't doubt that a PL/Python function could be written that would
work with or without a transform, that sounds to me like the kind of feat
undertaken to show it can be done. My normal expectation would be for ~ 100%
of real-world functions to faceplant if their data suddenly started
arriving in completely different datatypes. And we have, wisely, chosen
a design that rules that out.

Regards,
-Chap

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2025-04-16 02:11:57 Re: An incorrect check in get_memoize_path
Previous Message Peter Smith 2025-04-16 01:38:39 Re: Logical Replication of sequences