Re: unicode match normal forms

From: Gianni Ceccarelli <dakkar(at)thenautilus(dot)net>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: unicode match normal forms
Date: 2021-05-17 14:04:50
Message-ID: 20210517150450.50e499f1@exelion
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, 17 May 2021 15:45:00 +0200
Matthias Apitz <guru(at)unixarea(dot)de> wrote:
> There is only *one* codepoint for the German letter a Umlaut:
> LATIN SMALL LETTER A WITH DIAERESI U+00E4

True. On the other hand, the sequence:

* U+0061 LATIN SMALL LETTER A
* U+0308 COMBINING DIAERESIS

will render exactly the same glyph. The two forms are closely related:
U+00E4 is in NFC (normalization form canonical composition), U+0061
U+0308 is in NFD (normalization form canonical decomposition).

See https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization

--
Dakkar - <Mobilis in mobile>
GPG public key fingerprint = A071 E618 DD2C 5901 9574
6FE2 40EA 9883 7519 3F88
key id = 0x75193F88

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2021-05-17 14:07:39 Re:
Previous Message Gianni Ceccarelli 2021-05-17 14:00:49 Re: unicode match normal forms