| From: | Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com> | 
|---|---|
| To: | Marc Millas <marc(dot)millas(at)mokadb(dot)com>, pgsql-general(at)postgresql(dot)org | 
| Subject: | Re: sort order | 
| Date: | 2021-08-06 15:57:51 | 
| Message-ID: | 3940ae46-b123-f58e-aa58-e6f9309b5ef6@enterprisedb.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
On 27.07.21 19:07, Marc Millas wrote:
> so, obviously, both lc_collate knows about the é
> but obviously, too, they do behave differently on the impact of the 
> beginning white space.
> 
> I didn't see anything about this behaviour on the doc, unless the 
> reference at the libc should be understood as please read and test libc 
> doc on each platform.
> So my first question is: why ?
> My second question is: how to make the centos postgres behave like the 
> w10 one ??
> ie. knowing about french characters AND taking beginning white spaces 
> into account ?
There are multiple standard ways to deal with space and punctuation 
characters when sorting.  See 
<https://unicode-org.github.io/icu/userguide/collation/customization/ignorepunct.html> 
for a description.  Not all collation providers implement all of them, 
but the behavior you happen to get is usually one of them.  The centos 7 
behavior corresponds to "shift-trimmed", the Windows one appears to 
match "non-ignorable".  If you want to get that latter one on Linux as 
well, you can use the ICU locales, which also default to non-ignorable. 
For example
select * from test order by ble collate "fr-x-icu";
matches your Windows output for me.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2021-08-06 16:46:02 | Re: TLS 1.0 | 
| Previous Message | Tom Lane | 2021-08-06 13:50:34 | Re: psql's default database on connect (our internal ref. SRP-30861) |