Re: unicode match normal forms

From: Gianni Ceccarelli <dakkar(at)thenautilus(dot)net>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: unicode match normal forms
Date: 2021-05-17 13:44:31
Message-ID: 20210517144431.561d90f2@exelion
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 17 May 2021 13:27:40 -0000
hamann(dot)w(at)t-online(dot)de wrote:
> in unicode letter ä exists in two versions - linux and windows use a
> composite whereas macos prefers the decomposed form. Is there any way
> to make a semi-exact match that accepts both variants?

You should probably normalise the strings in whatever application code
handles the inserting. NFC is the "usually sensible" normal form to
use.

If you can't change the application code, you may use a trigger and
apply the `normalize(text[,form])→text` function to the values

https://www.postgresql.org/docs/13/functions-string.html#id-1.5.8.10.5.2.2.7.1.1.2

something vaguely like (totally untested!)::

create function normalize_filename() returns trigger as $$
begin
new.filename := normalize(new.filename);
return new;
end;
$$ language plpgsql;

create trigger normalize_filename
before insert or update
on that_table
for each row
execute function normalize_filename();

--
Dakkar - <Mobilis in mobile>
GPG public key fingerprint = A071 E618 DD2C 5901 9574
6FE2 40EA 9883 7519 3F88
key id = 0x75193F88

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Matthias Apitz 2021-05-17 13:45:00 Re: unicode match normal forms
Previous Message Tom Lane 2021-05-17 13:33:22