Re: Remaining dependency on setlocale()

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remaining dependency on setlocale()
Date: 2024-08-15 08:46:15
Message-ID: CA+hUKGKcnC_nc1WH5sQfD1-ADgRKAoVHrkDLY9omgoAubrbu-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 15, 2024 at 11:00 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Thu, 2024-08-15 at 10:43 +1200, Thomas Munro wrote:
> > So I think the solution could perhaps be something like: in some
> > early
> > startup phase before there are any threads, we nail down all the
> > locale categories to "C" (or whatever we decide on for the permanent
> > global locale), and also query the "" categories and make a copy of
> > them in case anyone wants them later, and then never call setlocale()
> > again.
>
> +1.

We currently nail down these categories:

/* We keep these set to "C" always. See pg_locale.c for explanation. */
init_locale("LC_MONETARY", LC_MONETARY, "C");
init_locale("LC_NUMERIC", LC_NUMERIC, "C");
init_locale("LC_TIME", LC_TIME, "C");

CF #5170 has patches to make it so that we stop changing them even
transiently, using locale_t interfaces to feed our caches of stuff
needed to work with those categories, so they really stay truly nailed
down.

It sounds like someone needs to investigate doing the same thing for
these two, from CheckMyDatabase():

if (pg_perm_setlocale(LC_COLLATE, collate) == NULL)
ereport(FATAL,
(errmsg("database locale is incompatible with
operating system"),
errdetail("The database was initialized with
LC_COLLATE \"%s\", "
" which is not recognized by setlocale().", collate),
errhint("Recreate the database with another locale or
install the missing locale.")));

if (pg_perm_setlocale(LC_CTYPE, ctype) == NULL)
ereport(FATAL,
(errmsg("database locale is incompatible with
operating system"),
errdetail("The database was initialized with LC_CTYPE \"%s\", "
" which is not recognized by setlocale().", ctype),
errhint("Recreate the database with another locale or
install the missing locale.")));

How should that work? Maybe we could imagine something like
MyDatabaseLocale, a locale_t with LC_COLLATE and LC_CTYPE categories
set appropriately. Or should it be a pg_locale_t instead (if your
database default provider is ICU, then you don't even need a locale_t,
right?).

Then I think there is one quite gnarly category, from
assign_locale_messages() (a GUC assignment function):

(void) pg_perm_setlocale(LC_MESSAGES, newval);

I have never really studied gettext(), but I know it was just
standardised in POSIX 2024, and the standardised interface has _l()
variants of all functions. Current implementations don't have them
yet. Clearly we absolutely couldn't call pg_perm_setlocale() after
early startup -- but if gettext() is relying on the current locale to
affect far away code, then maybe this is one place where we'd just
have to use uselocale(). Perhaps we could plan some transitional
strategy where NetBSD users lose the ability to change the GUC without
restarting the server and it has to be the same for all sessions, or
something like that, until they produce either gettext_l() or
uselocale(), but I haven't thought hard about this part at all yet...

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-08-15 08:49:11 Re: On non-Windows, hard depend on uselocale(3)
Previous Message Zhijie Hou (Fujitsu) 2024-08-15 08:06:03 RE: [BUG?] check_exclusion_or_unique_constraint false negative