Re: Schema variables - new implementation for Postgres 15

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Sergey Shinderuk <s(dot)shinderuk(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, dean(dot)a(dot)rasheed(at)gmail(dot)com, er(at)xs4all(dot)nl, joel(at)compiler(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Schema variables - new implementation for Postgres 15
Date: 2024-05-24 13:00:48
Message-ID: CAFj8pRB=ON4n1xbmXxW36bowW2r904txn+1N1wC_oSmNYLqKKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

pá 24. 5. 2024 v 13:32 odesílatel Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
napsal:

> > On Wed, May 22, 2024 at 08:44:28PM +0200, Pavel Stehule wrote:
> > st 22. 5. 2024 v 19:25 odesílatel Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> napsal:
> >
> > > Peter Eisentraut <peter(at)eisentraut(dot)org> writes:
> > > > On 18.05.24 13:29, Alvaro Herrera wrote:
> > > >> I want to note that when we discussed this patch series at the dev
> > > >> meeting in FOSDEM, a sort-of conclusion was reached that we didn't
> want
> > > >> schema variables at all because of the fact that creating a variable
> > > >> would potentially change the meaning of queries by shadowing table
> > > >> columns. But this turns out to be incorrect: it's_variables_ that
> are
> > > >> shadowed by table columns, not the other way around.
> > >
> > > > But that's still bad, because seemingly unrelated schema changes can
> > > > make variables appear and disappear. For example, if you have
> > > > SELECT a, b FROM table1
> > > > and then you drop column b, maybe the above query continues to work
> > > > because there is also a variable b.
> > >
> > > Yeah, that seems pretty dangerous. Could we make it safe enough
> > > by requiring some qualification on variable names? That is, if
> > > you mean b to be a variable, then you must write something like
> > >
> > > SELECT a, pg_variables.b FROM table1
> > >
> > > This is still ambiguous if you use "pg_variables" as a table alias in
> > > the query, but the alias would win so the query still means what it
> > > meant before. Also, table aliases (as opposed to actual table names)
> > > don't change readily, so I don't think there's much risk of the query
> > > suddenly meaning something different than it did yesterday.
> > >
> >
> > With active shadowing variable warning for described example you will
> get a
> > warning before dropping.
>
> I assume you're talking about a warning, which one will get querying the
> table with shadowed columns. If no such query has happened yet and the
> column was dropped, there will be no warning.
>

sure - the possible identifier collision cannot be solved in SQL perfectly.
It is the same with tables.
When I add badly named column to table, I'll get an error "ambiguous
columns" just when I'll execute
query. The system catalog just cannot protect against collisions - it is
true for columns, variables, tables.
Little bit protected are views, that are stored in parsed format, but any
other object can be broken when
somebody choose bad names in catalog or queries. There is not any
protection.

>
> Aside that, I'm afraid dropping a warning in log does not have
> sufficient visibility to warn about the issue, since one needs to read
> those logs first. I guess what folks are looking for is more constraints
> out of the box, preventing any ambiguity.
>

We can increase (optionality) the level of this message to error. It is not
perfect, but it can work well.

I think so there is not higher risk with variables than current risk with
just tables.

a) the possibility to create variables is limited by rights on schema. So
nobody can create variables without necessary rights (invisibly)

b) if user has own schema with CREATE right, then it can create variables
just for self, and with default setting, just visible for self,
and just accessible for self. When other users try to use these variables,
then the query fails due to missing access rights (usually).
Common user cannot to create variables in application schema and cannot to
set search_path for applications.

c) the changes of schema are usually tested on some testing stages before
are applied on production. So when there
will be possible collision or some other defect, probably it will be
detected there. Untested changes of catalog on production is not too common
today.

d) any risk that can be related for variables, is related just to renaming
column or table.

>
> > Session variables are joined with schema (in my proposal). Do anybody can
> > do just
> >
> > CREATE SCHEMA svars; -- or what (s)he likes
> > CREATE VARIABLE svars.b AS int;
> >
> > SELECT a, b FROM table1
> >
> > and if somebody can be really safe, the can write
> >
> > SELECT t.a, t.b FROM table1 t
> >
> > or
> >
> > SELECT t.a, svars.b FROM table1 t
> >
> > It can be customized in the way anybody prefers - just creating dedicated
> > schemas and setting search_path. Using its own schema for session
> variables
> > without enhancing search_path for this schema forces the necessity to set
> > only qualified names for session variables.
> >
> > Sure the naming of schemas, aliases can be unhappy wrong, and there can
> be
> > the problem. But this can be a problem today too.
>
> If I understand you correctly, you're saying that there are "best
> practices" how to deal with session variables to avoid any potential
> issues. But I think it's more user-friendly to have something that will
> not allow shooting yourself in the foot right out of the box. You're
> right, similar things could probably happen with the already existing
> functionality, but it doesn't give us rights to add more to it.
> Especially if it's going to be about a brand-new feature.
>

Unfortunately, there is not any possibility - just in SQL (without
introduction of variables).

Example - Tom's proposal using dedicated schema

ok - I can limit the possibility to create variables just for schema
"pg_var"

CREATE VARIABLE pg_var.a AS int;

but if somebody will write query like

SELECT pg_var.a FROM tab pg_var

then we are back on start.

>
> As far as I can see now, it's a major design flaw that could keep the
> patch from being accepted. Fortunately there are few good proposals how
> to address this, folks are genuinely trying to help. What do you think
> about trying some of them out, as an alternative approach, to compare
> functionality and user experience?
>

It is a design flaw of SQL. The issue we talk about is the generic property
of SQL, and then you cannot fix it.

I thought about possibility to introduce dedicated function

svalue(regvariable) returns any - with planner support

and possibility to force usage of this function. Another possibility is
using some simple dedicated operator (syntax) for force using of variables
so theoretically this can looks like:

set strict_usage_of_session_variables to on;
SELECT * FROM tab WHERE a = svalue('myvar.var');
or

SELECT * FROM tab WHERE a = @ myvar.var;

This can be really safe. Personally It is not my cup of tea, but I can live
it (and this mode can be default).

Theoretically we can limit usage of variables just for PL/pgSQL. It can
reduce risks too, but it breaks usage variables for parametrization of DO
blocks (what is my primary motivation), but it can be good enough to
support migration from PL/SQL.

>
> In the meantime I'm afraid I have to withdraw "Ready for committer"
> status, sorry. I've clearly underestimated the importance of variables
> shadowing, thanks Alvaro and Peter for pointing out some dangerous
> cases. I still believe though that the majority of the patch is in a
> good shape and the question about variables shadowing is the only thing
> that keeps it from moving forward.
>

I understand.

I'll try to recapitulate my objections against proposed designs

a) using syntax like MS - DECLARE command and '@@' prefix - it is dynamic,
so there is not possibility of static check. It is not joined with schema,
so there are possible collisions between variables and and the end the
variables are named like @@mypackage_myvar - so some custom naming
convention is necessary too. There is not possibility to set access rights.

b) using variables like MySQL - first usage define it, and access by '@'
prefix. It is simple, but without possibility of static check. There is not
possibility to set access rights.

c) using variables with necessity to define it in FROM clause. It is safe,
but it can be less readable, when you use more variables, and it is not too
readable, and user friendly, because you need to write FROM. And can be
messy, because you usually will use variables in queries, and it is
introduce not relations into FROM clause. But I can imagine this mode as
alternative syntax, but it is very unfriendly and not intuitive (I think).
More probably it doesn't fast execution in simple expression execution mode.

d) my proposal - there is possibility of collisions, but consistent with
naming of database objects, allows set of access rights, allows static
analyze, consistent with PL/pgSQL and similar to PL/pgSQL.

There is not any other possibility. Any time this is war between be user
friendly, be readable, be correctly - but there is not perfect solution,
because just SQL is not perfect. Almost all mentioned objections against
proposed variables are valid just for tables and columns.

Regards

Pavel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2024-05-24 13:08:22 Re: Wrong results with grouping sets
Previous Message Yao Wang 2024-05-24 12:50:54 Re: 回复: An implementation of multi-key sort