BUG #17571: PostgreSQL 15b2 ICU collation defined at db level sorts differently in collate clause vs implicit

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: daniel(dot)halsey(at)vectorsolutions(dot)com
Subject: BUG #17571: PostgreSQL 15b2 ICU collation defined at db level sorts differently in collate clause vs implicit
Date: 2022-08-04 14:56:32
Message-ID: 17571-8aa6361274d2e3c3@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17571
Logged by: Daniel Halsey
Email address: daniel(dot)halsey(at)vectorsolutions(dot)com
PostgreSQL version: Unsupported/Unknown
Operating system: Debian (Docker PG15b2 Bullseye)
Description:

I'm using an "official" 15beta2 bullseye Docker image for this: When using a
nondeterministic ICU collation as the default collation for a database,
sorting (ORDER BY) without a COLLATE subclause acts differently than the
same query with one. Instead of treating differently cased characters as
truly equivalent, it appears to treat casing as a tiebreaker within a
column. Multi-column ORDER BY clauses reveal this difference, as shown in
results from select statements in the demo code below. This doesn't appear
to duplicate an existing TODO, as far as I can determine.

-- Test code:
-----
-- as sa (postgres)

create collation if not exists "und-sorttest-x-icu"
(provider=icu,
locale='und-u-ks-level2-kc-false',
deterministic=false);

create database sorttest with
owner = postgres
encoding = 'UTF8'
locale_provider = 'icu'
icu_locale = 'und-sorttest-x-icu'
connection limit = -1
template = template0;

-- Connect to new "sorttest" db

create table sort_test (
id bigserial primary key,
sortableOne varchar(50),
sortableTwo varchar(50),
sortableThree int
);

insert into sort_test
(sortableOne, sortableTwo, sortableThree)
values
('Abc', 'B', 3),
('AbC', 'A', 3),
('AbC', 'B', 1),
('ABc', 'C', 3),
('AbC', 'C', 2),
('ABC', 'C', 1),
('ABc', 'A', 2)
;

-- Natural/index ordering
select * from sort_test;

-- Lower before upper in sortableOne, without regards to sortableTwo
secondary ordering
select * from sort_test
order by sortableOne, sortableTwo;

-- Truly non-deterministic sort on sortableOne, following secondary ordering
on sortableTwo
select * from sort_test
order by sortableOne collate "und-sorttest-x-icu", sortableTwo;

-- Additional test using tertiary sort
select * from sort_test
order by sortableOne, sortableTwo, sortableThree;

-- Additional test using tertiary sort
select * from sort_test
order by sortableOne collate "und-sorttest-x-icu", sortableTwo,
sortableThree;

-- LIKE clause treating default collation as deterministic
-- This is fine/preferred, since we can use it without specifying a separate
COLLATE subclause for the LIKE
select * from sort_test
where sortableOne like 'Ab%'
order by sortableOne collate "und-sorttest-x-icu", sortableTwo,
sortableThree;

-- ILIKE working as desired
select * from sort_test
where sortableOne ilike 'ab%'
order by sortableOne collate "und-sorttest-x-icu", sortableTwo,
sortableThree;

----- END Test code

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Japin Li 2022-08-04 14:59:07 Re: BUG #17570: Unrecognized node type for query with statistics on expressions
Previous Message Martijn van Oosterhout 2022-08-04 14:41:11 Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY