Re: Remaining dependency on setlocale()

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remaining dependency on setlocale()
Date: 2024-08-14 02:31:20
Message-ID: CA+hUKGL82jG2PdgfQtwWG+_51TQ--6M9XNa3rtt7ub+S3Pmfsw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 14, 2024 at 1:05 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> The only alternative is to essentially ban the use of non-_l variants,
> which is fine I suppose, but causes a fair amount of code churn.

Let's zoom out a bit and consider some ways we could set up the
process, threads and individual calls:

1. The process global locale is always "C". If you ever call
uselocale(), it can only be for short stretches, and you have to
restore it straight after; perhaps it is only ever used in replacement
_l() functions for systems that lack them. You need to use _l()
functions for all non-"C" locales. The current database default needs
to be available as a variable (in future: thread-local variable, or
reachable from one), so you can use it in _l() functions. The "C"
locale can be accessed implicitly with non-l() functions, or you could
ban those to reduce confusion and use foo_l(..., LC_GLOBAL_LOCALE) for
"C". Or a name like PG_C_LOCALE, which, in backend code could be just
LC_GLOBAL_LOCALE, while in frontend/library code it could be the
singleton mechanism I showed in CF#5166.

XXX Note that nailing LC_ALL to "C" in the backend would extend our
existing policy for LC_NUMERIC to all categories. That's why we can
use strtod() in the backend and expect the radix character to be '.'.
It's interesting to contemplate the strtod() calls in CF#5166: they
are code copied-and-pasted from backend and frontend; in the backend
we can use strtod() currently but in the frontend code I showed a
change to strtod_l(..., PG_C_LOCALE), in order to be able to delete
some ugly deeply nested uselocale()/setlocale() stuff of the exact
sort that NetBSD hackers (and I) hate. It's obviously a bit of a code
smell that it's copied-and-pasted in the first place, and should
really share code. Supposing some of that stuff finishes up in
src/common, then I think you'd want a strtod_l(..., PG_C_LOCALE) that
could be allowed to take advantage of the knowledge that the global
locale is "C" in the backend. Just thoughts...

2. The process global locale is always "C". Each backend process (in
future: thread) calls uselocale() to set the thread-local locale to
the database default, so it can keep using the non-_l() functions as a
way to access the database default, and otherwise uses _l() functions
if it wants something else (as we do already). The "C" locale can be
accessed with foo_l(..., LC_GLOBAL_LOCALE) or PG_C_LOCALE etc.

XXX This option is blocked by NetBSD's rejection of uselocale(). I
guess if you really wanted to work around NetBSD's policy you could
make your own wrapper for all affected functions, translating foo() to
foo_l(..., pg_thread_current_locale), so you could write uselocale(),
which is pretty much what every other libc does... But eughhh

3. The process global locale is inherited from the system or can be
set by the user however they want for the benefit of extensions, but
we never change it after startup or refer to it. Then we do the same
as 1 or 2, except if we ever want "C" we'll need a locale_t for that,
again perhaps using the PC_C_LOCALE mechanism. Non-_l() functions are
effectively useless except in cases where you really want to use the
system's settings inherited from startup, eg for messages, so they'd
mostly be banned.

What else?

> > They're right that we really just want to use "C" in some places, and
> > their LC_C_LOCALE is a very useful system-provided value to be able
> > to
> > pass into _l functions. It's a shame it's non-standard, because
> > without it you have to allocate a locale_t for "C" and keep it
> > somewhere to feed to _l functions...
>
> If we're going to do that, why not just have ascii-only variants of our
> own? pg_ascii_isspace(...) is at least as readable as isspace_l(...,
> LC_C_LOCALE).

Yeah, I agree there are some easy things we should do that way. In
fact we've already established that scanner_isspace() needs to be used
in lots more places for that, even aside from thread-safety.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-08-14 02:35:34 RE: Conflict detection and logging in logical replication
Previous Message Zhijie Hou (Fujitsu) 2024-08-14 02:31:10 RE: Conflict detection and logging in logical replication