Quick Links

Re: Unicode grapheme clusters

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Stark <stark(at)mit(dot)edu>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Unicode grapheme clusters
Date:	2023-01-20 00:47:49
Message-ID:	Y8nktYFVf21NmmU+@momjian.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Jan 19, 2023 at 07:37:48PM -0500, Greg Stark wrote:
> This is how we've always documented it. Postgres treats code points as
> "characters" not graphemes.
>
> You don't need to go to anything as esoteric as emojis to see this either.
> Accented characters like é have no canonical forms that are multiple code
> points and in some character sets some accented characters can only be
> represented that way.
>
> But I don't think there's any reason to consider changing e existing functions.
> They have to be consistent with substr and the other string manipulation
> functions.
>
> We could add new functions to work with graphemes but it might bring more pain
> keeping it up to date....

I am not sure what you are referring to above? character_length? I was
talking about display length, and psql uses that --- at some point, our
lack of support for graphemes will cause psql to not align columns.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Embrace your flaws. They make you human, rather than perfect,
which you will never be.

In response to

Re: Unicode grapheme clusters at 2023-01-20 00:37:48 from Greg Stark

Responses

Re: Unicode grapheme clusters at 2023-01-20 00:53:43 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2023-01-20 00:53:43	Re: Unicode grapheme clusters
Previous Message	David Rowley	2023-01-20 00:40:55	Re: refactoring relation extension and BufferAlloc(), faster COPY