Re: Remaining dependency on setlocale()

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remaining dependency on setlocale()
Date: 2024-08-07 20:52:41
Message-ID: CA+hUKGJUPPZZjZMGR047w=OrZgemZYoRrVPkvCdSO9iA56M0QA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 8, 2024 at 5:16 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> There are a ton of calls to, for example, isspace(), used mostly for
> parsing.
>
> I wouldn't expect a lot of differences in behavior from locale to
> locale, like might be the case with iswspace(), but behavior can be
> different at least in theory.
>
> So I guess we're stuck with setlocale()/uselocale() for a while, unless
> we're able to move most of those call sites over to an ascii-only
> variant.

We do know of a few isspace() calls that are already questionable[1]
(should be scanner_isspace(), or something like that). It's not only
weird that SELECT ROW('libertà!') is displayed with or without double
quote depending (in theory) on your locale, it's also undefined
behaviour because we feed individual bytes of a multi-byte sequence to
isspace(), so OSes disagree, and in practice we know that macOS and
Windows think that the byte 0xa inside 'à' is a space while glibc and
FreeBSD don't. Looking at the languages with many sequences
containing 0xa0, I guess you'd probably need to be processing CJK text
and cross-platform for the difference to become obvious (that was the
case for the problem report I analysed):

for i in range(1, 0xffff):
if (i < 0xd800 or i > 0xdfff) and 0xa0 in chr(i).encode('UTF-8'):
print("%04x: %s" % (i, chr(i)))

[1] https://www.postgresql.org/message-id/flat/CA%2BHWA9awUW0%2BRV_gO9r1ABZwGoZxPztcJxPy8vMFSTbTfi4jig%40mail.gmail.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Jungwirth 2024-08-07 20:54:52 Re: SQL:2011 application time
Previous Message Peter Eisentraut 2024-08-07 20:44:25 Re: tiny step toward threading: reduce dependence on setlocale()