From: | Gianni Ceccarelli <dakkar(at)thenautilus(dot)net> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: unicode match normal forms |
Date: | 2021-05-17 14:00:49 |
Message-ID: | 20210517150049.1fd73618@exelion |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 17 May 2021 13:27:40 -0000
hamann(dot)w(at)t-online(dot)de wrote:
> in unicode letter ä exists in two versions - linux and windows use a
> composite whereas macos prefers the decomposed form. Is there any
> way to make a semi-exact match that accepts both variants?
Actually, re-reading your request, you want to *find* both forms?
In which case, a function index may be more useful::
create index filename_unique_normalized
on that_table(normalize(filename));
then you can search::
select *
from that_table
where normalize(filename)=?
If you want to make sure that no two rows contain "equally-looking"
filenames, you can use a unique index::
create unique index filename_unique_normalized
on that_table(normalize(filename));
(while we're on the topic of "equally-looking" characters, you may
want to look at https://en.wikipedia.org/wiki/Homoglyph and
https://www.unicode.org/reports/tr36/ )
--
Dakkar - <Mobilis in mobile>
GPG public key fingerprint = A071 E618 DD2C 5901 9574
6FE2 40EA 9883 7519 3F88
key id = 0x75193F88
From | Date | Subject | |
---|---|---|---|
Next Message | Gianni Ceccarelli | 2021-05-17 14:04:50 | Re: unicode match normal forms |
Previous Message | Matthias Apitz | 2021-05-17 13:45:00 | Re: unicode match normal forms |