Re: search_path for PL/pgSQL functions partially cached?

From: Jan Behrens <jbe-mlist(at)magnetkern(dot)de>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: search_path for PL/pgSQL functions partially cached?
Date: 2025-01-04 23:12:49
Message-ID: 20250105001249.cdca9dd84a3061c06f936ca7@magnetkern.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, 4 Jan 2025 09:37:14 -0800
Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> wrote:

> On 1/3/25 15:22, Jan Behrens wrote:
> > On Fri, 3 Jan 2025 13:56:02 -0800
> > Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> wrote:
> >
> >> At this point I am lost as to what the overall goal of this is.
> >>
> >> Can you provide a 10000 ft view if what it is you are trying to achieve?
> >
> > Sure! I would like to create a component (e.g. a PostgreSQL extension)
> > that provides a function which processes some complex data, without
> > making any requirements regarding where the data is stored. To pass
> > this data to the function, I could use arrays of composite types, but
> > that seems to be very bulky. Another option would be to use cursors,
> > but that didn't turn out to work very smooth either.
> >
> > Instead, I plan to expect the function to receive a query string that
> > will get the data that is being processed by the function.
> >
> > That query string should be allowed to refer to tables in the
> > search_path at the caller's side.
> >
> > Therefore, I cannot use the "SET search_path FROM CURRENT" in my
> > "CREATE FUNCTION" statement, because it would overwrite the current
> > search_path on each call of the function.
> >
> > Thus my idea is to do this (simplified):
> >
> > CREATE FUNCTION "some_function" ("query_p" TEXT) RETURNS "some_type"
> >
> > --------------------------------------------------------------------
> > -- I cannot use SET search_path FROM CURRENT here, because "query_p"
> > -- shall refer to tables in the search_path of the caller.
> > --------------------------------------------------------------------
> >
> > LANGUAGE plpgsql AS $$
> > DECLARE
> > "old_search_path" TEXT;
> >
> > ----------------------------------------------------------------
> > -- I have to fully qualify types in the DECLARE section.
> > ----------------------------------------------------------------
> >
> > "some_variable" "some_schema"."some_type";
> > BEGIN
> > SELECT current_setting('search_path') INTO "old_search_path";
> > PERFORM set_config(
> > 'search_path',
> > 'some_schema, pg_temp, ' || "old_search_path",
> > TRUE
> > );
> >
> > ----------------------------------------------------------------
> > -- Do I have to fully qualify types and operators from
> > -- "myschema" here? Or is it safe to not fully qualify them?
(correction: "some_schema")
> > ----------------------------------------------------------------
> > END;
> > $$;
> >
> > That is my overall idea.
>
> Is 'some_schema' a known item when installing?

Yes, fortunately "some_schema" is a fixed name.

>
> Once you have the search_path defined and assuming all the objects you
> want are in that path, then yes you can drop the schema qualification.

That would be nice, but it doesn't seem to be the case. At least not
always. I constructed the following new example:

============

CREATE TABLE "tbl" ("col" NUMERIC(15, 0));

CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$
BEGIN
RETURN '2.4';
END;
$$;

BEGIN;

CREATE SCHEMA "myschema";
SET LOCAL search_path TO 'myschema';

CREATE TABLE "tbl" ("col" NUMERIC);

CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$
BEGIN
RETURN '5.4';
END;
$$;

CREATE FUNCTION "run"() RETURNS TEXT LANGUAGE plpgsql AS $$
DECLARE
"old_search_path" TEXT;
BEGIN
"old_search_path" := current_setting('search_path');
SET LOCAL search_path TO "myschema";
-- At this point, search_path is always set to 'myschema'!
DECLARE
"variable" "tbl"."col"%TYPE;
BEGIN
"variable" := "foo"();
RETURN "variable";
END;
PERFORM set_config('search_path', "old_search_path", TRUE);
END;
$$;

COMMIT;

SELECT "myschema"."run"(); -- returns '5.4' (when run in the same session)

-- reconnect to database here:
\c

SELECT "myschema"."run"(); -- returns '5'
SET search_path TO 'myschema';
SELECT "myschema"."run"(); -- returns '5'

-- reconnect to database again:
\c

SET search_path TO 'myschema';
SELECT "myschema"."run"(); -- returns '5.4'
SET search_path TO 'public';
SELECT "myschema"."run"(); -- returns '5.4'

============

Even if

DECLARE "variable" "tbl"."col"%TYPE;

follows *after* the schema is set to "myschema" in the example above, I
still get differing results, depending on how the search_path was set
when the function was first called.

I think this has to do with the fact that the overall structure and
probably types(?) are parsed first?

As Tom Lane wrote on Fri, 27 Dec 2024 16:03:17 -0500, "the types of
plpgsql variables are only looked up on the first use (within a
session)."

Does this apply to *all* types (e.g. types used in type-casts in
statements after BEGIN)? Or does it only apply to types in the DECLARE
section?

Maybe my most recent example is somewhat "crafted", but it makes me
feel insecure about what I can rely on. Could someone explain to me
what the exact rules are, or where to find them? I don't seem to
understand the exact behavior from reading the docs.

I re-read section 41.11.2. on Plan Caching:

"The PL/pgSQL interpreter parses the function's source text and
produces an internal binary instruction tree the first time the
function is called (within each session). The instruction tree fully
translates the PL/pgSQL statement structure, but individual SQL
expressions and SQL commands used in the function are not translated
immediately.
As each expression and SQL command is first executed in the function,
the PL/pgSQL interpreter parses and analyzes the command to create a
prepared statement, using the SPI manager's SPI_prepare function.
Subsequent visits to that expression or command reuse the prepared
statement. [...]"

It isn't specific about how DECLARE blocks are handled.

>
> >
> > My problem is that I'm confused about WHEN EXACTLY I have to qualify
> > tables/types, etc. It is very hard to understand from reading (just) the
> > documentation.
>
> If you are doing this as an extension then I suspect you want the
> processes shown here:
>
> https://www.postgresql.org/docs/17/extend-extensions.html#EXTEND-EXTENSIONS-RELOCATION

Yes, I'm aware of that, but the code defining the function is not part
of an extension in my case (only the RATIONAL type is part of an
extension). But thank you for pointing this out. I have been using
@extschema@ in extension code before, and recently also learned about
the @extschema:name@ syntax.

> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com

Many thanks already for all your input.

Regards,
Jan Behrens

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2025-01-05 00:04:49 Re: search_path for PL/pgSQL functions partially cached?
Previous Message Laurenz Albe 2025-01-04 21:02:53 Re: name difference of rpm packages of postgresql-private-devel-16.6.z