<!DOCTYPE html>
<html><head>
<meta charset="UTF-8">
</head><body><p>The weird thing is, that this behavior only occures when used on a field field a primary key:</p><p>If you do: </p><p>create database testdb;</p><p>\c testdb;</p><p>-- Just create a simple table with one column<br>create table icutest(data text not null collate "de-x-icu" primary key, data2 text collate "de-x-icu");</p><p>-- Insert a record with uppercase string<br>insert into icutest values ('MYTEST','MYTEST');</p><p>-- This is not giving a match<br>select * from icutest where data ilike 'mytest';</p><p>-- BUT THIS GIVES A MATCH:</p><p>select * from icutest where data2 ilike 'mytest';</p><p>-- So it seems to be especially related to the scenario where a primary key / index exists.</p><blockquote type="cite">On July 4, 2019 at 12:36 PM Daniel Verite <<a href="mailto:daniel(at)manitou-mail(dot)org">daniel(at)manitou-mail(dot)org</a>> wrote:<br><br><br> PG Bug reporting form wrote:<blockquote type="cite">-- Just create a simple table with one column <br>create table icutest(data text not null collate "de-x-icu" primary key); <br><br>-- Insert a record with uppercase string <br>insert into icutest values ('MYTEST'); <br><br>-- This is not giving a match <br>select * from icutest where data ilike 'mytest';</blockquote><br>This also happens on v10 and on the master branch.<br><br>The bug seems to come from a mistake in like_support.c:<br><br> <br>/* * Check whether char is a letter (and, hence, subject to case-folding) <br> * * In multibyte character sets or with ICU, we can't use isalpha, and it does * not seem worth trying to convert to wchar_t to use iswalpha. Instead,<br>just * assume any multibyte char is potentially case-varying. <br> */<br>static int<br>pattern_char_isalpha(char c, bool is_multibyte,<br> pg_locale_t locale, bool locale_is_c)<br>{<br> if (locale_is_c)<br> return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');<br> else if (is_multibyte && IS_HIGHBIT_SET(c))<br> return true;<br> else if (locale && locale->provider == COLLPROVIDER_ICU)<br> return IS_HIGHBIT_SET(c) ? true : false;<br><br><br>With an ICU locale, this returns false for all characters in 'mytest'.<br><br>I think this eventually leads the caller to incorrectly believe that it<br>can optimize the test into an exact match (data='mytest'), given<br>there are otherwise no wildcards in the pattern.<br><br>On fixing the bug, if we make this function returns true for all<br>characters under an ICU locale, it appears to work, but we're loosing an<br>opportunity to optimize for some patterns.<br>If OTOH we wanted to use an ICU call like u_isalpha(), to be closer<br>to what's done with libc, we'd need to pass a UChar32 argument,<br>not a char, and since we're in a char-oriented context, I don't see how<br>to do that.<br><br><br>Best regards,<br>-- <br>Daniel Vérité<br>PostgreSQL-powered mailer: <a href="http://www.manitou-mail.org" rel="noopener" target="_blank">http://www.manitou-mail.org</a><br>Twitter: @DanielVerite</blockquote></body></html>