From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | 荒井元成 <n2029(at)ndensan(dot)co(dot)jp> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: collate not support Unicode Variation Selector |
Date: | 2022-08-03 00:41:51 |
Message-ID: | CA+hUKGLnJUososSwJLycKfA0TXRsciKxPJfqVED=aOMYE1knOw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 3, 2022 at 12:09 PM 荒井元成 <n2029(at)ndensan(dot)co(dot)jp> wrote:
> D209007=# create table ivstest ( moji text collate "ja-x-icu" CONSTRAINT firstkey PRIMARY KEY );
> D209007=# insert into ivstest (moji) values ( U&'\+003436' || U&'\+0E0101' || U&'\+00304D');
> D209007=# insert into ivstest (moji) values ( U&'\+003436' || U&'\+00304D');
> D209007=# select moji from ivstest where moji like '%' || U&'\+003436' || '%';
> -------------
> 㐶󠄁き
> 㐶き
> (2 行)
>
> expected
> -------------
> 㐶き
> (1 行)
So you want to match only strings that contain U&'\+003436' *not*
followed by a variation selector (as we also discussed at [1]). I'm
pretty sure that everything in PostgreSQL considers variation
selectors to be separate characters. Perhaps it is possible to write
a regular expression covering the variation selector ranges, something
like '\U00003436[^\U000E0100-\U000E010EF]'?
Here's an example using Latin characters that are easier for me, but
show approximately the same thing, since variation selectors are a bit
like "combining" characters:
postgres=# create table t (x text);
CREATE TABLE
postgres=# insert into t values ('e'), ('ef'), ('e' || U&'\0301');
INSERT 0 3
postgres=# select * from t;
x
----
e
ef
é
(3 rows)
postgres=# select * from t where x ~ 'e([^\u0300-\u036f]|$)';
x
----
e
ef
(2 rows)
[1] https://www.postgresql.org/message-id/flat/013f01d873bb%24ff5f64b0%24fe1e2e10%24%40ndensan.co.jp
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-08-03 00:56:55 | Re: collate not support Unicode Variation Selector |
Previous Message | Tom Lane | 2022-08-03 00:18:20 | Re: Parallel Aggregates for string_agg and array_agg |