Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, adam(at)labkey(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails
Date: 2024-12-02 05:39:48
Message-ID: CA+hUKG+uJGt2tks0jyS=Gkjd0Uig-9cORVm-_igp50m7QwKrWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Nov 29, 2024 at 11:15 PM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> hm, are you saying that when choosing PG_SQL_ASCII to represent Mode 2
> (ASCII-only with validation) in the shared_catalog_encoding field,
> we're giving it a different semantic meaning than it has elsewhere in the
> system? Maybe we could just document very clearly that its meaning in
> shared_catalog_encoding is special/different?

Documentation seems enough to me. Unless you think it would be useful
outside this use case?

Hmm, in theory I suppose validated ASCII7 would be the ideal encoding
for template0, or generally any template that is allowed to be cloned
into a new database with a different encoding. It would enforce
something that is just a promise or convention today. I don't think
that is really a pressing issue in itself, the convention holds up
pretty well, but it's conceptually quite closely related, so it's at
least interesting to think about. I dunno.

> > $ CREATE ROLE lætitia;
> > ERROR: role name "lætitia" cannot be represented in the shared catalog
> > encoding SQL_ASCII
> > HINT: To allow non-ASCII roles, shared_catalog_encoding must be set to
> > an encoding matching all databases (or UNKNOWN, not recommended)
>
> That would sound reasonable to me.

Cool. You're both telling me this sounds worth trying out, so I
hacked up a proof-of-concept. Please see attached.

Almost half of the patch is tedious validation code that knows where
all the strings are hiding in the shared catalogs (and TODOs remain).
I wonder if that part could be automated somehow with schema analysis.
On the other hand, that stuff is not changing in a hurry, and you get
to write friendly actionable messages if you do it manually, for
example:

ALTER SYSTEM CATALOG ENCODING SQL_ASCII;
ERROR: role "regress_fred" has setting "application_name" with value
"café" that contains invalid characters
HINT: Consider ALTER ROLE ... SET ... TO ... using characters valid
in SQL_ASCII.

(See lots more variations and cases in regress/expected/catalog_encoding.out.)

I'm on the fence about the default. Here I defaulted to
single-encoding, figuring that multi-encoding clusters are possibly a
fairly advanced configuration and that that user group could just run
one extra command, and there's even a HINT.

I wrote some documentation as an exercise to see how practical and
defensible the idea seemed when put into manualspeak.

Despite being a really simple idea for adding one new switch that
really only "stops" things rather than actually doing anything new, it
finishes up interacting with locking, logging, checkpointing, control
file, redo, grammar, pg_upgrade (and I see now that pg_dumpall might
need a similar approach), and I didn't even look at the startup stuff
you guys were working on that can hopefully benefit from having
GetSharedCatalogEncoding(). Hopefully the patch has the right sort of
ideas in some of those places, but obviously it's a rapid prototype so
might be way off on some of the details. It fails on Windows CI in a
very minor way that I see how to fix... later. It's enough to try out
the user experience anyway. Feedback, flames and ideas welcome.

Attachment Content-Type Size
v1-0001-Formalize-the-encoding-of-text-in-shared-catalogs.patch text/x-patch 67.7 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2024-12-02 07:07:31 BUG #18729: update with multiple subpartitions return error: too many range table entries
Previous Message Peter Geoghegan 2024-12-01 15:36:23 Re: Re: Re:Re:Re: backup server core when redo btree_xlog_insert that type is XLOG_BTREE_INSERT_POST