Quick Links

Re: unicode match normal forms

From:	"Daniel Verite" <daniel(at)manitou-mail(dot)org>
To:	hamann(dot)w(at)t-online(dot)de
Cc:	pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: unicode match normal forms
Date:	2021-05-17 14:34:02
Message-ID:	48e7eaab-9403-4d65-8581-cd1e55231d28@manitou-mail.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hamann W wrote:

> in unicode letter ä exists in two versions - linux and windows use a
> composite whereas macos prefers
> the decomposed form. Is there any way to make a semi-exact match that
> accepts both variants?

Aside from normalizing the strings into the same normal form
before comparing, non-deterministic ICU collations will recognize them as
identical (they're "canonically equivalent" in Unicode terms)

For instance,

CREATE COLLATION nd (
provider = 'icu',
locale='',
deterministic = false
);

SELECT
nfc_form,
nfd_form,
nfc_form = nfd_form COLLATE nd AS equal1,
nfc_form = nfd_form COLLATE "C" AS equal2 -- or any deterministic collation
FROM
(VALUES
(E'j\u00E4hrlich',
E'j\u0061\u0308hrlich'))
AS s(nfc_form, nfd_form);

Normalizing is available as a built-in function since Postgres 13 and
non-deterministic collations appeared in Postgres 12.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

In response to

unicode match normal forms at 2021-05-17 13:27:40 from hamann.w

Browse pgsql-general by date

	From	Date	Subject
Next Message	goldgraeber-werbetechnik	2021-05-18 05:18:08	Re: unicode match normal forms
Previous Message	Tom Lane	2021-05-17 14:13:10