Quick Links

how to get collation according to Unicode Collation Algorithm?

From:	rudolf <stu3(dot)1(at)eq(dot)cz>
To:	pgsql-general(at)postgresql(dot)org
Subject:	how to get collation according to Unicode Collation Algorithm?
Date:	2013-04-06 10:57:22
Message-ID:	515FFF92.3040904@eq.cz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi,

I have a problem with proper collation of UTF-8 strings using PostgreSQL
version 9.2.4 under Debian Linux 6.0 with de_DE.utf8 (but en_US behaves
the same) locale:

CREATE TABLE test_collation ( q text );
INSERT INTO test_collation (q) VALUES ('aa'), ('ac'), ('a&b');
SELECT * FROM test_collation ORDER BY q COLLATE "de_DE";
q
-----
aa
a&b
ac

I need the "&" character to be sorted at the beginning or at the end of
the alphabet, but it seems like it is simply ignored. The space
character (" ") is treated the same way (just swap the ampersand in
previous example with a space).

I made a test on ICU pages (http://site.icu-project.org/) and there I
get proper collation: 1. a&b, 2. aa, 3. ac. Screenshot:
http://software.eq.cz/icu_collation_de_DE.png

Is there a way to achieve this collation (note also the order of the
characters with umlaut on the screenshot) with PostgreSQL? Or is it a
glibc bug?

Thanks,

Responses

Re: how to get collation according to Unicode Collation Algorithm? at 2013-04-06 23:46:27 from Jasen Betts

Browse pgsql-general by date

	From	Date	Subject
Next Message	Kevin Grittner	2013-04-06 13:52:06	Re: BEFORE UPDATE trigger doesn't change column value
Previous Message	dafNi	2013-04-06 10:51:52	optimizer's cost formulas