Re: Remaining dependency on setlocale()

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remaining dependency on setlocale()
Date: 2024-08-14 22:43:50
Message-ID: CA+hUKGK57sgUYKO03jB4VarTsswfMyScFAyJpVnYD8c+g12_mg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 7, 2024 at 7:07 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Wed, Aug 7, 2024 at 10:23 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> > > 2. I don't see a good way to canonicalize a locale name, like in
> > > check_locale(), which uses the result of setlocale().
> >
> > What I can tell you about that is that check_locale's expectation
> > that setlocale does any useful canonicalization is mostly wishful
> > thinking [1]. On a lot of platforms you just get the input string
> > back again. If that's the only thing keeping us on setlocale,
> > I think we could drop it. (Perhaps we should do some canonicalization
> > of our own instead?)
>
> +1
>
> I know it does something on Windows (we know the EDB installer gives
> it strings like "Language,Country" and it converts them to
> "Language_Country.Encoding", see various threads about it all going
> wrong), but I'm not sure it does anything we actually want to
> encourage. I'm hoping we can gradually screw it down so that we only
> have sane BCP 47 in the system on that OS, and I don't see why we
> wouldn't just use them verbatim.

Some more thoughts on check_locale() and canonicalisation:

I doubt the canonicalisation does anything useful on any Unix system,
as they're basically just file names. In the case of glibc, the
encoding part is munged before opening the file so it tolerates .utf8
or .UTF-8 or .u---T----f------8 on input, but it still returns
whatever you gave it so the return value isn't cleaning the input or
anything.

"" is a problem however... the special value for "native environment"
is returned as a real locale name, which we probably still need in
places. We could change that to newlocale("") + query instead, but
there is a portability pipeline problem getting the name out of it:

1. POSIX only just added getlocalename_l() in 2024[1][2].
2. Glibc has non-standard nl_langinfo_l(NL_LOCALE_NAME(category), loc).
3. The <xlocale.h> systems (macOS/*BSD) have non-standard
querylocale(mask, loc).
4. AFAIK there is no way to do it on pure POSIX 2008 systems.
5. For Windows, there is a completely different thing to get the
user's default locale, see CF#3772.

The systems in category 4 would in practice be Solaris and (if it
comes back) AIX. Given that, we probably just can't go that way soon.

So I think the solution could perhaps be something like: in some early
startup phase before there are any threads, we nail down all the
locale categories to "C" (or whatever we decide on for the permanent
global locale), and also query the "" categories and make a copy of
them in case anyone wants them later, and then never call setlocale()
again.

[1] https://pubs.opengroup.org/onlinepubs/9799919799/functions/getlocalename_l.html
[2] https://www.austingroupbugs.net/view.php?id=1220

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2024-08-14 22:55:03 Re: tiny step toward threading: reduce dependence on setlocale()
Previous Message Jacob Champion 2024-08-14 22:42:51 Re: Proposal for implementing OCSP Stapling in PostgreSQL