Quick Links

Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc:	Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, adam(at)labkey(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails
Date:	2024-11-21 16:44:44
Message-ID:	Zz9jfOkVmlYcYHSy@momjian.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On Thu, Nov 21, 2024 at 09:14:23AM -0600, Nathan Bossart wrote:
> On Thu, Nov 21, 2024 at 09:47:56AM -0500, Bruce Momjian wrote:
> > On Thu, Nov 21, 2024 at 02:35:50PM +0000, Bertrand Drouvot wrote:
> >> On Thu, Nov 21, 2024 at 09:21:16AM -0500, Bruce Momjian wrote:
> >> > I don't understand this logic. Why are two bytes important? If we knew
> >> > it was UTF8 we could check for non-first bytes always starting with
> >> > bits 10, but we can't know that.
> >>
> >> I think this is because this is a reliable way to detect if the truncation happened
> >> in the middle of a character, without needing to know the specifics of the encoding.
> >>
> >> My understanding is that the key insight is that in any multibyte encoding, all
> >> bytes within a multibyte character will have their high bits set.
> >>
> >> That's just my understanding from the code and Tom's previous explanations: I
> >> might be wrong as not an expert in this area.
> >
> > But the logic doesn't make sense. Why would two bytes be any different
> > than one?
>
> Tom provided a concise explanation upthread [0]. My understanding is the
> same as Bertrand's, i.e., this is an easy way to rule out a bunch of cases
> where we know that we couldn't possibly have truncated in the middle of a
> multi-byte character. This allows us to avoid doing multiple pg_database
> lookups.

Where does Tom mention anything about checking two bytes? He is
basically saying remove all trailing high-bit characters until you get a
match, because once you get a match, you are have found the point of
valid truncation for the encoding. In fact, here, he specifically talks
about MAX_MULTIBYTE_CHAR_LEN-1:

https://www.postgresql.org/message-id/3796535.1732044807%40sss.pgh.pa.us

This text:

* If the original name is too long and we see two consecutive bytes
* with their high bits set at the truncation point, we might have
* truncated in the middle of a multibyte character. In multibyte
* encodings, every byte of a multibyte character has its high bit
* set. So if IS_HIGHBIT_SET is true for both NAMEDATALEN-1 and
* NAMEDATALEN-2, we know we're in the middle of a multibyte
* character. We need to try truncating one more byte back to find the
* start of the next character.

needs to be fixed, at a minimum, specifically, "So if IS_HIGHBIT_SET is
true for both NAMEDATALEN-1 and NAMEDATALEN-2, we know we're in the
middle of a multibyte character."

> > I assumed you would just remove all trailing high-bit bytes
> > and stop and the first non-high-bit byte.
>
> I think this risks truncating more than one multi-byte character, which
> would cause the login path to truncate differently than the CREATE/ALTER
> DATABASE path (which is encoding-aware).

True, we can stop at MAX_MULTIBYTE_CHAR_LEN-1, and know there is no match.

> * Try to do multibyte-aware truncation (the patch at hand).

Yes, I am fine with that, but we need to do more than the patch does to
accomplish this, unless I am totally confused.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

When a patient asks the doctor, "Am I going to die?", he means
"Am I going to die soon?"

In response to

Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails at 2024-11-21 15:14:23 from Nathan Bossart

Responses

Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails at 2024-11-21 17:09:14 from Nathan Bossart

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Alvaro Herrera	2024-11-21 16:51:39	Re: pg_rewind WAL segments deletion pitfall
Previous Message	Peter Geoghegan	2024-11-21 15:58:03	Re: backup server core when redo btree_xlog_insert that type is XLOG_BTREE_INSERT_POST