From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | daniel(dot)halsey(at)vectorsolutions(dot)com |
Subject: | BUG #17571: PostgreSQL 15b2 ICU collation defined at db level sorts differently in collate clause vs implicit |
Date: | 2022-08-04 14:56:32 |
Message-ID: | 17571-8aa6361274d2e3c3@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 17571
Logged by: Daniel Halsey
Email address: daniel(dot)halsey(at)vectorsolutions(dot)com
PostgreSQL version: Unsupported/Unknown
Operating system: Debian (Docker PG15b2 Bullseye)
Description:
I'm using an "official" 15beta2 bullseye Docker image for this: When using a
nondeterministic ICU collation as the default collation for a database,
sorting (ORDER BY) without a COLLATE subclause acts differently than the
same query with one. Instead of treating differently cased characters as
truly equivalent, it appears to treat casing as a tiebreaker within a
column. Multi-column ORDER BY clauses reveal this difference, as shown in
results from select statements in the demo code below. This doesn't appear
to duplicate an existing TODO, as far as I can determine.
-- Test code:
-----
-- as sa (postgres)
create collation if not exists "und-sorttest-x-icu"
(provider=icu,
locale='und-u-ks-level2-kc-false',
deterministic=false);
create database sorttest with
owner = postgres
encoding = 'UTF8'
locale_provider = 'icu'
icu_locale = 'und-sorttest-x-icu'
connection limit = -1
template = template0;
-- Connect to new "sorttest" db
create table sort_test (
id bigserial primary key,
sortableOne varchar(50),
sortableTwo varchar(50),
sortableThree int
);
insert into sort_test
(sortableOne, sortableTwo, sortableThree)
values
('Abc', 'B', 3),
('AbC', 'A', 3),
('AbC', 'B', 1),
('ABc', 'C', 3),
('AbC', 'C', 2),
('ABC', 'C', 1),
('ABc', 'A', 2)
;
-- Natural/index ordering
select * from sort_test;
-- Lower before upper in sortableOne, without regards to sortableTwo
secondary ordering
select * from sort_test
order by sortableOne, sortableTwo;
-- Truly non-deterministic sort on sortableOne, following secondary ordering
on sortableTwo
select * from sort_test
order by sortableOne collate "und-sorttest-x-icu", sortableTwo;
-- Additional test using tertiary sort
select * from sort_test
order by sortableOne, sortableTwo, sortableThree;
-- Additional test using tertiary sort
select * from sort_test
order by sortableOne collate "und-sorttest-x-icu", sortableTwo,
sortableThree;
-- LIKE clause treating default collation as deterministic
-- This is fine/preferred, since we can use it without specifying a separate
COLLATE subclause for the LIKE
select * from sort_test
where sortableOne like 'Ab%'
order by sortableOne collate "und-sorttest-x-icu", sortableTwo,
sortableThree;
-- ILIKE working as desired
select * from sort_test
where sortableOne ilike 'ab%'
order by sortableOne collate "und-sorttest-x-icu", sortableTwo,
sortableThree;
----- END Test code
From | Date | Subject | |
---|---|---|---|
Next Message | Japin Li | 2022-08-04 14:59:07 | Re: BUG #17570: Unrecognized node type for query with statistics on expressions |
Previous Message | Martijn van Oosterhout | 2022-08-04 14:41:11 | Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY |