Re: utf8 vs UTF-8

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Troels Arvin <troels(at)arvin(dot)dk>, pgsql-general(at)lists(dot)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: utf8 vs UTF-8
Date: 2024-05-18 15:01:17
Message-ID: f510e041-7e9b-4745-847b-06b9dcce6281@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 5/18/24 07:48, Troels Arvin wrote:
> Hello,
>
> Tom Lane wrote:
> >>  test1  | loc_test | UTF8   | libc     | en_US.UTF-8 | en_US.UTF-8
> >>  test3  | troels   | UTF8   | libc     | en_US.utf8  | en_US.utf8
> >
> > On most if not all platforms, both those spellings of the locale names
> > will be taken as valid.  You might try running "locale -a" to get an
> > idea of which one is preferred according to your current libc
> > installation
>
> "locale -a" on the Ubuntu system outputs this:
>
>   C
>   C.utf8
>   en_US.utf8
>   POSIX

If you expand that to locale -v -a you get:

locale: en_US.utf8 archive: /usr/lib/locale/locale-archive
-------------------------------------------------------------------------------
title | English locale for the USA
source | Free Software Foundation, Inc.
address | https://www.gnu.org/software/libc/
email | bug-glibc-locales(at)gnu(dot)org
language | American English
territory | United States
revision | 1.0
date | 2000-06-24
codeset | UTF-8

> So at first, I thought en_US.utf8 would be the most correct locale
> identifier. However, when I look at Postgres' own databases, they have
> the slightly different locale string:
>
>   psql --list | grep -E 'postgres|template'
>   postgres  | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
>   template0 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
>   template1 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
>
> Also, when I try to create a database with "en_US.utf8" as locale
> without specifying a template:
>
> troels=# create database test4 locale 'en_US.utf8';
> ERROR:  new collation (en_US.utf8) is incompatible with the collation of
> the template database (en_US.UTF-8)
> HINT:  Use the same collation as in the template database, or use
> template0 as template.

I'm going to say that is Postgres being exact to a fault.

>
> Given the locale of Postgres' own databases and Postgres' error message,
> I'm leaning to en_US.UTF-8 being the most correct locale to use. Because
> why would Postgres care about it, if utf8/UTF-8 doesn't matter?
>
>
>> but TBH, I doubt it's worth worrying about.
>
> But couldn't there be an issue, if for example the client's locale and
> the server's locale aren't exactly the same? I'm thinking maybe the
> client library has to perform unneeded translation of the stream of data
> to/from the database?

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rich Shepard 2024-05-18 15:01:53 Re: Left join syntax error
Previous Message Erik Wienhold 2024-05-18 15:00:05 Re: Left join syntax error