Re: collate not support Unicode Variation Selector

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: 荒井元成 <n2029(at)ndensan(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: collate not support Unicode Variation Selector
Date: 2022-08-03 00:56:55
Message-ID: 2810887.1659488215@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> So you want to match only strings that contain U&'\+003436' *not*
> followed by a variation selector (as we also discussed at [1]). I'm
> pretty sure that everything in PostgreSQL considers variation
> selectors to be separate characters.

There might be something that doesn't, but LIKE certainly isn't it.
I don't believe plain LIKE is collation-aware at all, it just sees
characters to match or not match. ILIKE is a little collation-aware,
but it's still not going to consider a combining sequence as one
character. The same for the regex operators.

Maybe it would help if you run the strings through normalize() first?
I'm not sure if that can combine combining characters.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-08-03 01:04:14 Re: Slow standby snapshot
Previous Message Thomas Munro 2022-08-03 00:41:51 Re: collate not support Unicode Variation Selector