From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | adam(at)labkey(dot)com, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails |
Date: | 2024-11-20 15:35:33 |
Message-ID: | Zz4BxVqAZnWHFXFB@nathan |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Tue, Nov 19, 2024 at 11:23:13PM -0500, Tom Lane wrote:
> Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
>> I'm admittedly not an expert in the multi-byte code, but since there are
>> encodings like LATIN1 that use a byte per character, don't we need to do
>> multiple lookups any time the NAMEDATALEN-1'th byte is non-ASCII?
>
> I don't think so, but maybe I'm missing something. An important
> property of backend-legal encodings is that all bytes of a multibyte
> character have their high bits set. Thus if the NAMEDATALEN-2'th
> byte does not have that, it is not part of a multibyte character.
> That's also the reason we can stop if we reach a high-bit-clear
> byte while backing up to earlier bytes.
That's good to know. If we can assume that 1) all bytes of a multibyte
character have the high bit set and 2) all multibyte characters actually
require multiple bytes, then there are just a handful of cases that require
multiple lookups, and we can restrict even those to some extent, too.
--
nathan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-11-20 15:39:35 | Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails |
Previous Message | Nathan Bossart | 2024-11-20 15:32:47 | Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails |