From: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
---|---|
To: | Martin Flahault <martin(at)billjobs(dot)com> |
Cc: | Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Collate order on Mac OS X, text with diacritics in UTF-8 |
Date: | 2010-01-13 22:02:18 |
Message-ID: | 20100113220218.GB23892@svana.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Jan 13, 2010 at 04:15:06PM +0100, Martin Flahault wrote:
[postgres]
> newbase=# select * from t1 order by contenu;
> contenu
> ---------
> A
> E
> a
> e
Postgresql outputs whatever the C library does on the underlying
system. The quality of this varies wildly.
> à
> As with others DBMS (MySQL for example), diacritics should be ignored when determining the sort order. Here is the expected output:
MySQL implements the unicode collation algorithm, which means it
essentially does what you want.
>
> It seems there is a problem with the collating order on BSD systems with diacritics using UTF8.
Last I checked, BSD did not support useful sorting on UTF-8 at all, so
it's not surprised it doesn't work.
> in a UTF8 text file and use the "sort" command on it, you will have the same wrong output as with PostgreSQL :
Yes, that's the basic idea. Mac OS X apparently provides ICU underneath
for programs that would like true unicode collation, but there is
little chance that postgresql will ever use this.
Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Mead | 2010-01-13 22:17:51 | Re: R: Re: R: Re: Weird EXECUTE ... USING behaviour |
Previous Message | Vincenzo Romano | 2010-01-13 22:00:47 | R: Re: R: Re: R: Re: Weird EXECUTE ... USING behaviour |