Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

From: Gavan Schneider <list(dot)pg(dot)gavan(at)pendari(dot)org>
To: Bryn Llewellyn <bryn(at)yugabyte(dot)com>
Cc: Tom Lane PostgreSQL <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, pgsql-general list <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE
Date: 2022-10-06 08:49:58
Message-ID: A15C12F9-029C-4B03-9633-FE9D5B176CD1@pendari.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 6 Oct 2022, at 16:04, Bryn Llewellyn wrote:

> Does this imply a risk that a future PG version will go against the
> SQL standard and reject any non-latin name that is free of all
> punctuation characters, when used in the role of a SQL identifier,
> unless it's double quoted?
>
From my perspective this thread seems to miss the essential purposes
behind quote_ident(). It is part of processing external/user input —
1. Protecting from PostgreSQL which always maps everything to lower case
before anything gets to the parser
2. Protection against SQL injection when processing input from outside
the trusted perimeter

Expecting an arbitrary string to be equal to itself after it has been
through string processing code is risky unless that processing is part
of the design, and quote_ident() was never designed to be part of any
such arrangement.

Expanding —

1. It is a complex question what happens to non-ASCII characters when
they are mapped to lower case… sometimes this is a meaningful concept
e.g., ∏ -> π, sometimes it is not, e.g., pick any
Chinese/Korean/Japanese character. If the designer decides to use
non-ASCII characters in the identifier they can… just double-quote
those identifiers. If the designer wants to use camelCase ASCII they
can, but the identifier will be camelcase inside the machine unless it
was double quoted.

AFAIK we never really use quote_ident() except to process external
input. As noted above this function is not designed to be part of an
equality test when attempting system introspection, rather —

2. The simple quote_ident() function can also be used to wrap untrusted
input so it will not mess with the parser. It is used with
quote_literal() when building dynamic SQL statements from user (i.e.,
untrusted) input.

From my perspective any use of these function outside their scope is
just that… outside their scope, with no promise this usage will work
or comply with any current or future standard, or imply anything useful
about pretty much anything.

Maybe I’m oversimplifying but I believe the current functions work and
do their specific jobs, and have nothing to do with anything else. So
there is no surprise for me in the subject line. There is mild surprise
the question was asked.

BTW this ignores whether or not PG mapping everything that’s not
quoted to lower case is standards compliant. This whole topic would be
simpler if the case was left alone but that’s a long road ago and I
believe most of the bridges have been burnt :)

Regards

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a
well-known solution to every human problem — neat, plausible, and
wrong.
— H. L. Mencken, 1920

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Karsten Hilbert 2022-10-06 09:16:30 Aw: Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE
Previous Message Peter Smith 2022-10-06 08:30:47 Re: Support logical replication of DDLs