From: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> |
---|---|
To: | Troels Arvin <troels(at)arvin(dot)dk>, pgsql-general(at)lists(dot)postgresql(dot)org |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: utf8 vs UTF-8 |
Date: | 2024-05-18 15:01:17 |
Message-ID: | f510e041-7e9b-4745-847b-06b9dcce6281@aklaver.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 5/18/24 07:48, Troels Arvin wrote:
> Hello,
>
> Tom Lane wrote:
> >> test1 | loc_test | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8
> >> test3 | troels | UTF8 | libc | en_US.utf8 | en_US.utf8
> >
> > On most if not all platforms, both those spellings of the locale names
> > will be taken as valid. You might try running "locale -a" to get an
> > idea of which one is preferred according to your current libc
> > installation
>
> "locale -a" on the Ubuntu system outputs this:
>
> C
> C.utf8
> en_US.utf8
> POSIX
If you expand that to locale -v -a you get:
locale: en_US.utf8 archive: /usr/lib/locale/locale-archive
-------------------------------------------------------------------------------
title | English locale for the USA
source | Free Software Foundation, Inc.
address | https://www.gnu.org/software/libc/
email | bug-glibc-locales(at)gnu(dot)org
language | American English
territory | United States
revision | 1.0
date | 2000-06-24
codeset | UTF-8
> So at first, I thought en_US.utf8 would be the most correct locale
> identifier. However, when I look at Postgres' own databases, they have
> the slightly different locale string:
>
> psql --list | grep -E 'postgres|template'
> postgres | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
> template0 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
> template1 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
>
> Also, when I try to create a database with "en_US.utf8" as locale
> without specifying a template:
>
> troels=# create database test4 locale 'en_US.utf8';
> ERROR: new collation (en_US.utf8) is incompatible with the collation of
> the template database (en_US.UTF-8)
> HINT: Use the same collation as in the template database, or use
> template0 as template.
I'm going to say that is Postgres being exact to a fault.
>
> Given the locale of Postgres' own databases and Postgres' error message,
> I'm leaning to en_US.UTF-8 being the most correct locale to use. Because
> why would Postgres care about it, if utf8/UTF-8 doesn't matter?
>
>
>> but TBH, I doubt it's worth worrying about.
>
> But couldn't there be an issue, if for example the client's locale and
> the server's locale aren't exactly the same? I'm thinking maybe the
> client library has to perform unneeded translation of the stream of data
> to/from the database?
--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Rich Shepard | 2024-05-18 15:01:53 | Re: Left join syntax error |
Previous Message | Erik Wienhold | 2024-05-18 15:00:05 | Re: Left join syntax error |