Re: utf8 vs UTF-8

From: Troels Arvin <troels(at)arvin(dot)dk>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: utf8 vs UTF-8
Date: 2024-05-18 14:48:36
Message-ID: 89165125-54b6-46a2-9b2c-0a7e275596bf@arvin.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

Tom Lane wrote:
>>  test1  | loc_test | UTF8   | libc     | en_US.UTF-8 | en_US.UTF-8
>>  test3  | troels   | UTF8   | libc     | en_US.utf8  | en_US.utf8
>
> On most if not all platforms, both those spellings of the locale names
> will be taken as valid.  You might try running "locale -a" to get an
> idea of which one is preferred according to your current libc
> installation

"locale -a" on the Ubuntu system outputs this:

  C
  C.utf8
  en_US.utf8
  POSIX

On a CentOS7 system, it's sort-of the same:

  locale -a | grep -i en_us
  en_US
  en_US.iso88591
  en_US.iso885915
  en_US.utf8

So at first, I thought en_US.utf8 would be the most correct locale
identifier. However, when I look at Postgres' own databases, they have
the slightly different locale string:

  psql --list | grep -E 'postgres|template'
  postgres  | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
  template0 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
  template1 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...

Also, when I try to create a database with "en_US.utf8" as locale
without specifying a template:

troels=# create database test4 locale 'en_US.utf8';
ERROR:  new collation (en_US.utf8) is incompatible with the collation of
the template database (en_US.UTF-8)
HINT:  Use the same collation as in the template database, or use
template0 as template.

Given the locale of Postgres' own databases and Postgres' error message,
I'm leaning to en_US.UTF-8 being the most correct locale to use. Because
why would Postgres care about it, if utf8/UTF-8 doesn't matter?

> but TBH, I doubt it's worth worrying about.

But couldn't there be an issue, if for example the client's locale and
the server's locale aren't exactly the same? I'm thinking maybe the
client library has to perform unneeded translation of the stream of data
to/from the database?

--
Kind regards,
Troels

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ray O'Donnell 2024-05-18 14:49:02 Re: Left join syntax error
Previous Message Adrian Klaver 2024-05-18 14:48:02 Re: Valid until