Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation

From: James Lucas <jlucasdba(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation
Date: 2020-05-27 15:23:06
Message-ID: CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi all,

Wanted to call out what seems like a possible bug in non-deterministic
collation handling with pattern matching operators. Per the
documentation, non-deterministic collations are not supported with
pattern matching operators. Section 9.7 of the PG12 manual recommends
"The pattern matching operators of all three kinds do not support
nondeterministic collations. If required, apply a different collation
to the expression to work around this limitation." However, I'm
finding that pattern matching operations fail when a column is
declared with a non-deterministic collation, *even if* a different,
deterministic collation is explicitly applied to the pattern matching
operation. This doesn't seem to be the expected behavior.

Example. This is tested on Postgres 12.3, on Centos 8.1.1911 with libicu 60.3.

Create a non-deterministic collation.
create collation mycollation (provider = icu, locale =
'en-US-ks-level2.utf8', deterministic = false);

Create a couple of sample tables:
create table ctest (id numeric, t text);
create table ctestnd (id numeric, t text collate mycollation);

Populate them with some data:
insert into ctest values (1,'aAa');
insert into ctest select generate_series(2,100000),'bbb';
insert into ctestnd select id, t from ctest;
analyze ctest, ctestnd;

Add a few indexes:
create index ctest_idx01 on ctest (t);
create index ctest_idx02 on ctest (t collate "C");
create index ctestnd_idx01 on ctestnd (t);
create index ctestnd_idx02 on ctestnd (t collate "C");

Test on ctest:
explain select * from ctest where t = 'aAa' collate "C";
QUERY PLAN
--------------------------------------------------------------------------
Index Scan using ctest_idx02 on ctest (cost=0.42..4.44 rows=1 width=10)
Index Cond: (t = 'aAa'::text COLLATE "C")
COMMENT: Works as expected.

explain select * from ctest where t like 'a%';
QUERY PLAN
--------------------------------------------------------------------------
Index Scan using ctest_idx02 on ctest (cost=0.42..8.44 rows=1 width=10)
Index Cond: ((t >= 'a'::text) AND (t < 'b'::text))
Filter: (t ~~ 'a%'::text)
COMMENT: Actually this is very interesting, because even without an
explicit COLLATE clause, LIKE still uses the "C" collation index. Not
sure if that's intended behavior either?

explain select * from ctest where t like 'a%' collate "C";
QUERY PLAN
--------------------------------------------------------------------------
Index Scan using ctest_idx02 on ctest (cost=0.42..8.44 rows=1 width=10)
Index Cond: ((t >= 'a'::text) AND (t < 'b'::text))
Filter: (t ~~ 'a%'::text COLLATE "C")
COMMENT: Uses explicit collation and index as expected.

Test on ctestnd:
explain select * from ctestnd where t = 'aAa' collate "C";
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using ctestnd_idx02 on ctestnd (cost=0.42..4.44 rows=1 width=10)
Index Cond: (t = 'aAa'::text COLLATE "C")
COMMENT: Works as expected.

explain select * from ctestnd where t like 'a%';
ERROR: nondeterministic collations are not supported for LIKE
COMMENT: Fails as expected.

explain select * from ctestnd where t like 'a%' collate "C";
ERROR: nondeterministic collations are not supported for LIKE
COMMENT: Not expected. It seems like the explicit COLLATE clause is
ignored in this case. I've tried different placements for the COLLATE
clause, and none seem to work.

Is this a bug, or have I missed something?

Thanks,
James Lucas

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2020-05-27 15:53:04 Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation
Previous Message Tom Lane 2020-05-27 12:18:43 Re: Re[2]: