Re: sort order for UTF-8 char column with Japanese UTF-8

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Matthias Apitz <guru(at)unixarea(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: sort order for UTF-8 char column with Japanese UTF-8
Date: 2022-02-03 21:50:48
Message-ID: CA+hUKGLR86ZK8dq0onE4ExMvtVU9w41ZpUsBjVxoddWzO1b0NA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Feb 4, 2022 at 8:11 AM Matthias Apitz <guru(at)unixarea(dot)de> wrote:
> On my FreeBSD laptop the same file sorts as
>
> guru(at)c720-r368166:~ $ LANG=de_DE.UTF-8 sort swd
> A
> ゲアハルト・A・リッター
> ゲルハルト・A・リッター
> チャールズ・A・ビアード
> A010STRUKTUR
> A010STRUKTUR
> A010STRUKTUR
> A0150SUPRALEITER

Wow, so it's one thing to have a different default "script order" than
glibc and ICU (which is something you can customise IIRC), but isn't
something broken here if the Japanese text comes between "A" and
"A0..."?? Hmm, it's almost as if it completely ignored the Japanese
text. From my FreeBSD box:

tmunro=> select * from t order by x collate "de_DE.UTF-8";
x
--------------------------
ゲアハルト
A
ゲアハルト・A・リッター
A0
A010STRUKTUR
AA
ゲアハルト・AA・リッター
ゲアハルト・B・リッター
(8 rows)

tmunro=> select * from t order by x collate "ja_JP.UTF-8";
x
--------------------------
A
A0
A010STRUKTUR
AA
ゲアハルト
ゲアハルト・AA・リッター
ゲアハルト・A・リッター
ゲアハルト・B・リッター
(8 rows)

Seems like something to investigate in FreeBSD land.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michael Lewis 2022-02-03 21:53:02 Re: pg_cron for vacuum - dynamic table set
Previous Message David G. Johnston 2022-02-03 21:48:26 Re: pg_cron for vacuum - dynamic table set