Re: sort order for UTF-8 char column with Japanese UTF-8

From: Matthias Apitz <guru(at)unixarea(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: sort order for UTF-8 char column with Japanese UTF-8
Date: 2022-02-04 08:29:05
Message-ID: 20220204082905.GA28@sh4-5.1blu.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

El día Donnerstag, Februar 03, 2022 a las 10:00:37 -0500, Tom Lane escribió:

> Matthias Apitz <guru(at)unixarea(dot)de> writes:
> > El día jueves, febrero 03, 2022 a las 11:14:55 +0100, Matthias Apitz escribió:
> >> With ESQL/C on a PostgreSQL 13.1 server I see the result of this query:
> >> select katkey,normform from swd_anzeige where normform >= 'A' ORDER BY ASC;
> >> coming out in this order:
> >> ...
> >> I loaded the same table in my server, but can't get the same order with
> >> psql:
>
> Do the two machines produce the same results if you sort the data in
> question with sort(1)? (Being careful to set LANG=de_DE.UTF-8 of
> course.) I rather doubt this has anything to do with Postgres as such;
> there are lots of inter-system and inter-release discrepancies in
> collation behavior.

No, they do not. I gathered from the admin of the remote (customer)
server the output of sort(1) in dependency of LANG and LC_ALL (see
below). Only when I let her unset the UTF-8 env vars, the result is with
the Japanese lines at the end, with UTF-8 env they're sorted at the
beginning.

On my own server (the only difference is that my server is
SUSE Linux Enterprise Server 15 SP3, while the customer still runs SP2)
I never get the Japanese lines on top with the same commands as used by
the remote admin. I requested now in addition the output of

ls -l /lib64/libc.* /usr/lib/locale/de_DE.utf8

to see if the libc version is different, mine is libc-2.31.so

Said that, does the SORT done by the server depends on the environment
(LANG, LC_*) in which the PostgreSQL server is started or only of the
sp_catalog information of the database in question?

Thanks

matthias

LC_ALL=de_DE.UTF-8 sort swd
A
ゲアハルト・A・リッター
ゲルハルト・A・リッター
チャールズ・A・ビアード
A010STRUKTUR
A010STRUKTUR
A010STRUKTUR
A0150SUPRALEITER

LANG=de_DE.UTF-8 sort swd
A
ゲアハルト・A・リッター
ゲルハルト・A・リッター
チャールズ・A・ビアード
A010STRUKTUR
A010STRUKTUR
A010STRUKTUR
A0150SUPRALEITER

sort swd
A
ゲアハルト・A・リッター
ゲルハルト・A・リッター
チャールズ・A・ビアード
A010STRUKTUR
A010STRUKTUR
A010STRUKTUR
A0150SUPRALEITER

env | grep LC
LC_ALL=de_DE.UTF-8

env | grep LANG
LANG=de_DE.UTF-8

unset LC_ALL LC_COLLATE LANG
sort swd
A
A010STRUKTUR
A010STRUKTUR
A010STRUKTUR
A0150SUPRALEITER
ゲアハルト・A・リッター
ゲルハルト・A・リッター
チャールズ・A・ビアード

--
Matthias Apitz, ✉ guru(at)unixarea(dot)de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2022-02-04 14:38:41 Re: sort order for UTF-8 char column with Japanese UTF-8
Previous Message Pavel Stehule 2022-02-04 06:16:47 Re: Postgres Version Upgrade to 14.1 error