From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | 荒井元成 <n2029(at)ndensan(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: collate not support Unicode Variation Selector |
Date: | 2022-08-03 02:02:08 |
Message-ID: | CA+hUKGK31mD5JUfTwnq5OuBx4i5eO2TaXSV1yvScQppG_Sx+Dg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 3, 2022 at 12:56 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Maybe it would help if you run the strings through normalize() first?
> I'm not sure if that can combine combining characters.
I think the similarity between Latin combining characters and these
ideographic variations might end there. I don't think there is a
single codepoint version of U&'\+003436' || U&'\+0E0101', unlike é.
This system is for controlling small differences in rendering for the
"same" character[1]. My computer doesn't even show the OP's example
glyphs as different (to my eyes, at least; I can see on a random
picture I found[2] that the one with the e0101 selector is supposed to
have a ... what do you call that ... a tiny gap :-)).
[1] http://www.unicode.org/reports/tr37/tr37-14.html
[2] https://glyphwiki.org/wiki/u3436
From | Date | Subject | |
---|---|---|---|
Next Message | Dong Wook Lee | 2022-08-03 02:19:59 | Re: pgstattuple: add test for coverage |
Previous Message | Dmitry Koterov | 2022-08-03 01:57:41 | Does having pg_last_wal_replay_lsn[replica] >= pg_current_wal_insert_lsn[master] guarantee that the replica is caught up? |