Re: proposal: schema variables

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Michael Paquier <michael(at)paquier(dot)xyz>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, DUVAL REMI <REMI(dot)DUVAL(at)cheops(dot)fr>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: proposal: schema variables
Date: 2024-08-27 06:15:42
Message-ID: 04ec666686e9e21cb515617df06885c66f3d34ce.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2024-08-15 at 07:55 +0200, Pavel Stehule wrote:
> út 30. 7. 2024 v 21:46 odesílatel Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> napsal:
> > - A general reminder: single line comments should start with a lower case
> >   letter and have to period in the end:
>
> Should it be "have not to period in the end" ?

I made a typo. I mean "have *no* period in the end".

> I fixed almost all without parts related to psql and executor - there almost all
> current comments break the mentioned rule. So I use the style used in these files.
> I can fix it (if you like it) - or it can be fixed generally in a separated patch set.

Thanks. I also noticed that a lot of existing comments break that rule,
so I think that it is fine if you stick with the way that the surrounding
code does it.

> > - Variable as parameter:
> >
> >   CREATE VARIABLE var AS date;
> >   LET var = current_date;
> >   PREPARE stmt(date) AS SELECT $1;
> >   EXECUTE stmt(var);
> >   ERROR:  paramid of PARAM_VARIABLE param is out of range
> >
> >   Is that working as intended?  I don't understand the error message.
>
> fixed in 0002 patch (variables cannot be used as EXECUTE argument) and 0014 (enable usage variables as an argument of EXECUTE)

Thanks.

> >   > --- a/src/backend/catalog/namespace.c
> >   > +++ b/src/backend/catalog/namespace.c
> >   > [...]
> >   > +/*
> >   > + * IdentifyVariable - try to find variable identified by list of names.
> >   > [...]
> >
> >   Perhaps part of the reason why this function is so complicated is that
> >   you support things like this:
> >
> >     CREATE SCHEMA sch;
> >     CREATE TYPE cp AS (a integer, b integer);
> >     CREATE VARIABLE sch.v AS cp;
> >     LET sch.v = (1, 2);
> >     SELECT sch.v.b;
> >      b
> >     ═══
> >      2
> >     (1 row)
> >
> >     test=# SELECT test.sch.v.b;
> >      b
> >     ═══
> >      2
> >     (1 row)
> >
> >   We don't support that for tables:
> >
> >     CREATE TABLE tab (c cp);
> >     INSERT INTO tab VALUES (ROW(42, 43));
> >     SELECT tab.c.b FROM tab;
> >     ERROR:  cross-database references are not implemented: tab.c.b
> >
> >   You have to use extra parentheses:
> >
> >     SELECT (tab.c).b FROM tab;
> >      b 
> >     ════
> >      43
> >     (1 row)
> >
> >   I'd say that this should be the same for session variables.
> >   What do you think?
>
> I prefer the current state, but I don't have a very strong opinion about it.
> It can save 115 lines of almost trivial code, but I think the user experience
> can be much worse.  Using composite types in tables is not too common a
> pattern (and when I read some recommendations for Oracle, it is an antipattern),
> but usage of composite variables is common (it can hold a row). Moreover,
> we talked long about possible identifier's collisions, and the pattern
> schema.variable is very good protection against possible collisions.
> Probably the pattern catalog.schema.variable.field is not too interesting
> but the support has 50 lines.
>
> More, the plpgsql supports pattern label.variable.field, so can be little bit
> unfriendly if session variables doesn't support "similar" pattern

I see your point, and I'm fine with leaving it as it is.

>
>
> > - src/backend/commands/session_variable.c
> >
> >   There is one comment that confuses me and that I did not edit:
> >
> >   > +Datum
> >   > +GetSessionVariable(Oid varid, bool *isNull, Oid *typid)
> >   > +{
> >   > +   SVariable   svar;
> >   > +
> >   > +   svar = get_session_variable(varid);
> >   > +
> >   > +   /*
> >   > +    * Although svar is freshly validated in this point, the svar->is_valid can
> >   > +    * be false, due possible accepting invalidation message inside domain
> >   > +    * check. Now, the validation is done after lock, that can also accept
> >   > +    * invalidation message, so validation should be trustful.
> >   > +    *
> >   > +    * For now, we don't need to repeat validation. Only svar should be valid
> >   > +    * pointer.
> >   > +    */
>
> This comment is related to assertions. Before I had there `Assert(svar->is_valid)`,
> because I expected it. But it was not always true. And although it is true,
> we don't need to validate a variable, because at this moment, the variable
> should be locked, and then we can return content safely.

I guess my main problem is the word "trustful". I don't recognize that word.
Perhaps you can reword the comment along the lines of your above explanation.

>
> > - src/backend/executor/execMain.c
> >
> >   > @@ -196,6 +198,51 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
> >   >     Assert(queryDesc->sourceText != NULL);
> >   >     estate->es_sourceText = queryDesc->sourceText;
> >   >
> >   > +   /*
> >   > +    * The executor doesn't work with session variables directly. Values of
> >   > +    * related session variables are copied to dedicated array, and this array
> >   > +    * is passed to executor.
> >   > +    */
> >
> >   Why?  Is that a performance measure?  The comment should explain that.
>
> Session variables are volatile internally - some function that uses LET statements
> can be called more times inside a query. This behavior is not consistent with
> plpgsql's variables or external parameters. And if we want to support parallel
> queries, then volatile session variables can be pretty messy (and unusable).
> If we want the same behaviour for queries with or without parallel support,
> then the session variables should be "stable" (like stable functions).
> Simple implementation is using "snapshot" of values of used session variables
> when query is started. This "snapshot" is immutable, so we don't need to make
> a copy more times.
>
> I changed this comment

Thanks.

> > - parallel safety
> >
> >   > --- a/src/backend/optimizer/util/clauses.c
> >   > +++ b/src/backend/optimizer/util/clauses.c
> >   > @@ -935,6 +936,13 @@ max_parallel_hazard_walker(Node *node, max_parallel_hazard_context *context)
> >   >         if (param->paramkind == PARAM_EXTERN)
> >   >             return false;
> >   >
> >   > +       /* We doesn't support passing session variables to workers */
> >   > +       if (param->paramkind == PARAM_VARIABLE)
> >   > +       {
> >   > +           if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> >   > +               return true;
> >   > +       }
> >
> >   Even if a later patch alleviates that restriction, this patch should document that
> >   session variables imply "parallel restricted".  I have added that to doc/src/sgml/parallel.sgml.
> >
>
> merged (and removed in 0015)

Thanks.

I hope I can do some more review at some point in the future...

I sincerely hope that this patch set gets merged at some point.
The big obstacle is that it is so large. That's probably because it is pretty
feature-complete (but, as we have found, not totally free of bugs).

Judging from the amount of time I put into my review so far, I guess that this
patch set would keep a committer busy for several weeks. Perhaps the only way to
get this done would be to make you a committer...

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message chungui.wcg 2024-08-27 06:43:51 Re:allowing extensions to control planner behavior
Previous Message Justin Clift 2024-08-27 06:00:13 Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.