Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: adam(at)labkey(dot)com, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails
Date: 2024-11-19 19:33:27
Message-ID: 3796535.1732044807@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
> On Sun, Nov 17, 2024 at 01:00:14PM -0500, Tom Lane wrote:
>> As said, the difficulty is that we don't know what encoding the
>> incoming name is meant to be in, and with multibyte encodings that
>> matters. The name actually stored in the catalog might be less
>> than 63 bytes long if it was truncated in a multibyte-aware way,
>> so that the former behavior of blindly truncating at 63 bytes
>> can still yield unexpected no-such-database results.

> I wonder if we should consider removing the identifier truncation
> altogether. Granted, it mostly works (or at least did before v17), but I'm
> not sure we'd make the same decision today if we were starting from
> scratch. IMHO it'd be better to ERROR so that users are forced to produce
> legal identifiers. That being said, I realize this behavior has been
> present for over a quarter century now [0] [1] [2], and folks are unlikely
> to be happy with even more breakage.

Yeah, I think removing it now is a non-starter.

I did think of a way that we could approximate encoding-correct
truncation here, relying on the fact that what's in pg_database
is encoding-correct according to somebody:

1. If NAMEDATALEN-1'th byte is ASCII (high bit clear), just truncate
there and look up as usual.

2. If it's non-ASCII, truncate there and try to look up. On success,
we're good. On failure, if the next-to-last byte is non-ASCII,
truncate that too and try to look up. Repeat a maximum of
MAX_MULTIBYTE_CHAR_LEN-1 times before failing.

I think this works unconditionally so long as all entries in
pg_database.datname are in the same encoding. If there's a
mixture of encodings (which we don't forbid) then in principle
you could probably select a database other than the one the
client thought it was asking for. But that seems mighty
improbable, and the answer can always be "so connect using
the name as it appears in the catalog".

It's ugly of course. But considering that we got a complaint
so quickly after v17 release, I'm not sure we can just camp on
562bee0fc as being an acceptable answer.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Erik Wienhold 2024-11-19 22:34:23 Re: BUG #18715: replace() function silently fails if 3rd argument is null
Previous Message Nathan Bossart 2024-11-19 18:48:19 Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails