Re: BUG #16570: Collation not working

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Arnaud Perrier" <arnaud(dot)perrier(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16570: Collation not working
Date: 2020-08-05 14:49:49
Message-ID: 118c09c5-3334-4570-8644-7c5300fbc0f2@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Arnaud Perrier wrote:

> 5/ SELECT * FROM TBL ORDER BY TEXT1 COLLATE digitslast, TEXT2 COLLATE
> digitslast;
> 5/ KO = TEXT1 = digits after letters + TEXT2 = digits before letters

I'm seeing wrong results too, and I don't really understand what's going
on with this collation.

Here's a simpler test with the current dev branch (14devel)
and ICU 60.2-3ubuntu.

=# CREATE COLLATION digitslast (provider = icu, locale =
'en(at)colReorder=latn-digit');

=# select * from (values('1'),('a')) as list(x) order by x collate
digitslast;
x
---
a
1

The digit is sorted after the letter, OK.

But then why would an inequality test report the opposite?

=# select 'a' < '1' collate digitslast;
?column?
----------
f

I'm wondering whether there is a discrepancy between the binary
sort keys and results of direct comparisons by ucol_strcoll()
for this collation.
Maybe that could explain why the second level of ORDER BY
produces results that are inconsistent with the first level.

With icu_ext we can visualize the sort keys and see that key('a')
if lower than key('1').

=# select icu_sort_key('a' collate digitslast) as sort_key_a,
icu_sort_key('1' collate digitslast) as sort_key_1

sort_key_a | sort_key_1
--------------+--------------
\x0f01050105 | \x4901050105

But then a direct comparison says the opposite, just like the
'<' operator as above.

=# select icu_compare('a', '1', 'en(at)colReorder=latn-digit');
icu_compare
-------------
1

icu_compare() is a simple interface on top of ucol_strcollUTF8()
or ucol_strcoll(), and 1 as a result maps to UCOL_GREATER, so unless
I'm missing something, this is looking more like an ICU bug than a
Postgres bug.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jan Kort 2020-08-05 16:39:20 Re: Index not used without explicit typecast
Previous Message Tom Lane 2020-08-05 14:32:22 Re: Index not used without explicit typecast