From: | "Daniel Verite" <daniel(at)manitou-mail(dot)org> |
---|---|
To: | "Igal Sapir" <igal(at)lucee(dot)org> |
Cc: | "stan" <stanb(at)panix(dot)com>,"pgsql-general" <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Case Insensitive Comparison with Postgres 12 |
Date: | 2019-10-12 13:17:55 |
Message-ID: | a9900eab-fe6f-414a-84c9-9fee0d39d6ac@manitou-mail.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Igal Sapir wrote:
> > Out of curiosity is there a eason not to use the citext type for th?
> >
> >
> Using the collation seems like a much cleaner approach, and I trust ICU to
> do a better job at comparing strings according to language rules etc.
One notable difference between citext and case-insensitive collations
by ICU is that the latter recognizes canonically equivalent sequences
of codepoints [1] as equal, while the former does not.
For instance:
=# CREATE COLLATION ci (locale='und(at)colStrength=secondary',
provider='icu', deterministic=false);
=# SELECT E'E\u0302TES'::citext = 'Êtes'::citext AS "citext-equal",
E'E\u0302TES' = 'Êtes' collate "ci" AS "ci-equal";
citext-equal | ci-equal
--------------+----------
f | t
Another significant difference is that building or rebuilding an index on a
text column with a CI collation appears to be way faster than with citext
(I've seen 10:1 ratios, but do your own tests).
On the minus side, substring matching with LIKE or other methods
is not possible with CI collations whereas it does work with citext.
[1] https://en.wikipedia.org/wiki/Unicode_equivalence
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
From | Date | Subject | |
---|---|---|---|
Next Message | Christoph Moench-Tegeder | 2019-10-12 13:27:13 | Re: SELECT d02name::bytea FROM ... && DBI::Pg |
Previous Message | Christoph Moench-Tegeder | 2019-10-12 13:14:24 | Re: Perl DBI converts UTF-8 again to UTF-8 before sending it to the server |