Re: Schema variables - new implementation for Postgres 15

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Joel Jacobson <joel(at)compiler(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Schema variables - new implementation for Postgres 15
Date: 2022-01-19 08:01:17
Message-ID: 20220119080117.4c2fjzs5p7vtdxhk@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Jan 18, 2022 at 10:01:01PM +0100, Pavel Stehule wrote:
> pá 14. 1. 2022 v 3:44 odesílatel Julien Rouhaud <rjuju123(at)gmail(dot)com> napsal:
>
> >
> > =# let myvariable = 'AA';
> > LET
> >
> > =# select 'AA' collate "en-x-icu" < myvariable;
> > ?column?
> > ----------
> > f
> > (1 row)
> >
> > =# select 'AA' collate "en-x-icu" < myvariable collate mycollation;
> > ERROR: 42P21: collation mismatch between explicit collations "en-x-icu"
> > and "mycollation"
> > LINE 1: select 'AA' collate "en-x-icu" < myvariable collate mycollat...
> >
>
> What do you expect? I don't understand collating well, but it looks
> correct. Minimally the tables have the same behavior.

Indeed, I actually didn't know that such object's collation were implicit and
could be overloaded without a problem as long as there's no conflict between
all the explicit collations. So I agree that the current behavior is ok,
including a correct handling for wanted conflicts:

=# create variable var1 text collate "fr-x-icu";
CREATE VARIABLE

=# create variable var2 text collate "en-x-icu";
CREATE VARIABLE

=# let var1 = 'hoho';
LET

=# let var2 = 'hoho';
LET

=# select var1 < var2;
ERROR: 42P22: could not determine which collation to use for string comparison
HINT: Use the COLLATE clause to set the collation explicitly.

> Please, can you check the attached patches?

All the issue I mentioned are fixed, thanks!

I see a few problems with the other new features added though. The new
session_variables_ambiguity_warning GUC is called even in contexts where it
shouldn't apply. For instance:

=# set session_variables_ambiguity_warning = 1;
SET

=# create variable v text;
CREATE VARIABLE

=# DO $$
DECLARE v text;
BEGIN
v := 'test';
RAISE NOTICE 'v: %', v;
END;
$$ LANGUAGE plpgsql;
WARNING: 42702: session variable "v" is shadowed by column
LINE 1: v := 'test'
^
DETAIL: The identifier can be column reference or session variable reference.
HINT: The column reference is preferred against session variable reference.
QUERY: v := 'test'

But this "v := 'test'" shouldn't be a substitute for a LET, and it indeed
doesn't work:

=# DO $$
BEGIN
v := 'test';
RAISE NOTICE 'v: %', v;
END;
$$ LANGUAGE plpgsql;
ERROR: 42601: "v" is not a known variable
LINE 3: v := 'test';

But the RAISE NOTICE does see the session variable (which should be the correct
behavior I think), so the warning should have been raised for this instruction
(and in that case the message is incorrect, as it's not shadowing a column).

Also, the pg_dump handling emits a COLLATION option for session variables even
for default collation, while it should only emit it if the collation is not the
type's default collation. As a reference, for attributes the SQL used is:

"CASE WHEN a.attcollation <> t.typcollation "
"THEN a.attcollation ELSE 0 END AS attcollation,\n"

Also, should \dV or \dV+ show the collation?

And a few comments on the new chunks in this version of the patch (I didn't
look in detail at the whole patch yet):

+ <para>
+ The session variables can be overshadowed by columns in an query. When query
+ holds identifier or qualified identifier that can be used as session variable
+ identifier and as column identifier too, then it is used as column identifier
+ every time. This situation can be logged by enabling configuration
+ parameter <xref linkend="guc-session-variables-ambiguity-warning"/>.
+ </para>

Is "overshadowed" correct? The rest of the patch only says "shadow(ed)".

While at it, here's some proposition to improve the phrasing:

+ The session variables can be shadowed by column references in a query. When a
+ query contains identifiers or qualified identifiers that could be used as both
+ a session variable identifiers and as column identifier, then the column
+ identifier is preferred every time. Warnings can be emitted when this situation
+ happens by enabling configuration parameter <xref
+ linkend="guc-session-variables-ambiguity-warning"/>.

Similarly, the next documentation could be rephrased to:

+ When on, a warning is raised when any identifier in a query could be used as both
+ a column identifier or a session variable identifier.
+ The default is <literal>off</literal>.

Few other nitpicking:

+ * If we really detect collision of column and variable identifier,
+ * then we prefer column, because we don't want to allow to break
+ * an existing valid queries by new variable.

s/an existing/existing

+-- it is ambigonuous, but columns are preferred

ambiguous?

@@ -369,6 +367,19 @@ VariableCreate(const char *varName,
/* dependency on extension */
recordDependencyOnCurrentExtension(&myself, false);

+ /*
+ * Normal dependency from a domain to its collation. We know the default
+ * collation is pinned, so don't bother recording it.
+ */
+ if (OidIsValid(varCollation) &&
+ varCollation != DEFAULT_COLLATION_OID)

The comment mentions domains rather than session variables.

And for the initial patch, while looking around I found this comment on
fix_alternative_subplan():

@@ -1866,7 +1969,9 @@ fix_alternative_subplan(PlannerInfo *root, AlternativeSubPlan *asplan,
* replacing Aggref nodes that should be replaced by initplan output Params,
* choosing the best implementation for AlternativeSubPlans,
* looking up operator opcode info for OpExpr and related nodes,
- * and adding OIDs from regclass Const nodes into root->glob->relationOids.
+ * and adding OIDs from regclass Const nodes into root->glob->relationOids,
+ * and replacing PARAM_VARIABLE paramid, that is the oid of the session variable
+ * to offset the array by query used session variables. ???

I don't really understand the comment, and the "???" looks a bit suspicious.
I'm assuming it's a reference to this new behavior in fix_param_node():

* fix_param_node
* Do set_plan_references processing on a Param
+ * Collect session variables list and replace variable oid by
+ * index to collected list.
*
* If it's a PARAM_MULTIEXPR, replace it with the appropriate Param from
* root->multiexpr_params; otherwise no change is needed.
* Just for paranoia's sake, we make a copy of the node in either case.
+ *
+ * If it's a PARAM_VARIABLE, then we should to calculate paramid.

Some improvement on the comments would be welcome there, probably including
some mention to the "glob->sessionVariables" collected list?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-01-19 08:13:18 Re: pg_upgrade should truncate/remove its logs before running
Previous Message Amit Langote 2022-01-19 07:59:00 Re: a misbehavior of partition row movement (?)