From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, adam(at)labkey(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails |
Date: | 2024-11-21 17:09:14 |
Message-ID: | Zz9pOi3pGF-DnJTp@nathan |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, Nov 21, 2024 at 11:44:44AM -0500, Bruce Momjian wrote:
> On Thu, Nov 21, 2024 at 09:14:23AM -0600, Nathan Bossart wrote:
>> Tom provided a concise explanation upthread [0]. My understanding is the
>> same as Bertrand's, i.e., this is an easy way to rule out a bunch of cases
>> where we know that we couldn't possibly have truncated in the middle of a
>> multi-byte character. This allows us to avoid doing multiple pg_database
>> lookups.
>
> Where does Tom mention anything about checking two bytes?
Here [0]. And he further elaborated on this idea here [1].
> He is
> basically saying remove all trailing high-bit characters until you get a
> match, because once you get a match, you are have found the point of
> valid truncation for the encoding.
Yes, we still need to do that if it's possible the truncation wiped out
part of a multi-byte character. But it's not possible that we truncated
part of a multi-byte character if the NAMEDATALEN-1'th or NAMEDATALEN-2'th
byte is ASCII, in which case we can avoid doing extra lookups.
> This text:
>
> * If the original name is too long and we see two consecutive bytes
> * with their high bits set at the truncation point, we might have
> * truncated in the middle of a multibyte character. In multibyte
> * encodings, every byte of a multibyte character has its high bit
> * set. So if IS_HIGHBIT_SET is true for both NAMEDATALEN-1 and
> * NAMEDATALEN-2, we know we're in the middle of a multibyte
> * character. We need to try truncating one more byte back to find the
> * start of the next character.
>
> needs to be fixed, at a minimum, specifically, "So if IS_HIGHBIT_SET is
> true for both NAMEDATALEN-1 and NAMEDATALEN-2, we know we're in the
> middle of a multibyte character."
Agreed, the second-to-last sentence should be adjusted to something like
"we might be in the middle of a multibyte character." We don't know for
sure.
>> * Try to do multibyte-aware truncation (the patch at hand).
>
> Yes, I am fine with that, but we need to do more than the patch does to
> accomplish this, unless I am totally confused.
What more do you think is required?
[0] https://postgr.es/m/3976665.1732057784%40sss.pgh.pa.us
[1] https://postgr.es/m/158506.1732120196%40sss.pgh.pa.us
--
nathan
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2024-11-21 18:05:38 | Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails |
Previous Message | Alvaro Herrera | 2024-11-21 16:51:39 | Re: pg_rewind WAL segments deletion pitfall |