Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, adam(at)labkey(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails
Date: 2024-12-03 07:38:56
Message-ID: Z061kNj9z0Nic5EF@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On Mon, Dec 02, 2024 at 06:39:48PM +1300, Thomas Munro wrote:
> On Fri, Nov 29, 2024 at 11:15 PM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > hm, are you saying that when choosing PG_SQL_ASCII to represent Mode 2
> > (ASCII-only with validation) in the shared_catalog_encoding field,
> > we're giving it a different semantic meaning than it has elsewhere in the
> > system? Maybe we could just document very clearly that its meaning in
> > shared_catalog_encoding is special/different?
>
> Documentation seems enough to me. Unless you think it would be useful
> outside this use case?

No, I also think that documentation update should be enough.

> Hmm, in theory I suppose validated ASCII7 would be the ideal encoding
> for template0, or generally any template that is allowed to be cloned
> into a new database with a different encoding. It would enforce
> something that is just a promise or convention today.

I agree since all ASCII7 characters should be valid in the other encoding.

> I don't think
> that is really a pressing issue in itself, the convention holds up
> pretty well, but it's conceptually quite closely related, so it's at
> least interesting to think about. I dunno.

Yeah, while it's true that SQL_ASCII is more like "no encoding" (no validation
and allows any byte sequence), I also don't think there is pressure to enforce
"real" ASCII7 for template0.

> > > $ CREATE ROLE lætitia;
> > > ERROR: role name "lætitia" cannot be represented in the shared catalog
> > > encoding SQL_ASCII
> > > HINT: To allow non-ASCII roles, shared_catalog_encoding must be set to
> > > an encoding matching all databases (or UNKNOWN, not recommended)
> >
> > That would sound reasonable to me.
>
> Cool. You're both telling me this sounds worth trying out, so I
> hacked up a proof-of-concept. Please see attached.

Thanks for the patch!

> Almost half of the patch is tedious validation code that knows where
> all the strings are hiding in the shared catalogs (and TODOs remain).
> I wonder if that part could be automated somehow with schema analysis.
> On the other hand, that stuff is not changing in a hurry, and you get
> to write friendly actionable messages if you do it manually, for

I also think the user experience is important here (better error messages) over
maintenance simplicity (automated schema analysis).

Maybe we can try an "hybrid" approach that could simplify the AlterSystemCatalogEncoding()
by relying on a new struct, say:

"
typedef struct {
Oid relationId;
const char *relationName;
struct {
AttrNumber attrNum;
bool isName;
const char *fieldDesc;
const char *hintMsg;
} attrs[4];
} CatalogValidationDef;
"

And then let the validation iterates over an array of CatalogValidationDef?

"
static const CatalogValidationDef validationDefs[] = {
{
AuthIdRelationId,
"role",
{{Anum_pg_authid_rolname, true, "name",
"Consider ALTER ROLE ... RENAME TO ... using characters valid in SQL_ASCII."}}
},
.
.
"
And use schema analysis "only" to verify completeness (catch missing fields)?

> I'm on the fence about the default. Here I defaulted to
> single-encoding, figuring that multi-encoding clusters are possibly a
> fairly advanced configuration and that that user group could just run
> one extra command, and there's even a HINT.

I think that makes sense. It makes the default "simple" and the complex case
possible considering multi-encoding an advanced configuration.

> Despite being a really simple idea for adding one new switch that
> really only "stops" things rather than actually doing anything new, it
> finishes up interacting with locking, logging, checkpointing, control
> file, redo, grammar, pg_upgrade (and I see now that pg_dumpall might
> need a similar approach),

Yeah, indeed the impact is not negligible and that's probably one more reason
to "just" keep the "only stops" behavior.

> and I didn't even look at the startup stuff
> you guys were working on that can hopefully benefit from having
> GetSharedCatalogEncoding(). Hopefully the patch has the right sort of
> ideas in some of those places, but obviously it's a rapid prototype so
> might be way off on some of the details.

I'll look more in details in the near future (unless someone has cons about your
suggested approach, which is not my case).

Do you think it's worth to move the discussion into a dedicated hackers thread?
(maybe reaching a wider audience?) I think the subject is sensible enough.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2024-12-03 08:03:55 Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails
Previous Message Sandeep Thakkar 2024-12-03 06:03:20 Re: BUG #18707: Installation issue