Re: Unicode grapheme clusters

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unicode grapheme clusters
Date: 2023-01-24 19:20:32
Message-ID: Y9AvgA1+93WXp9gN@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 24, 2023 at 11:40:01AM -0500, Greg Stark wrote:
> On Sat, 21 Jan 2023 at 13:17, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Probably our long-term answer is to avoid depending on wcwidth
> > and use wcswidth instead. But it's hard to get excited about
> > doing the legwork for that until popular libc implementations
> > get it right.
>
> Here's an interesting blog post about trying to do this in Rust:
>
> https://tomdebruijn.com/posts/rust-string-length-width-calculations/
>
> TL;DR... Even counting the number of graphemes isn't enough because
> terminals typically (but not always) display emoji graphemes using two
> columns.
>
> At the end of the day Unicode kind of assumes a variable-width display
> where the rendering is handled by something that has access to the
> actual font metrics. So anything trying to line things up in columns
> in a way that works with any rendering system down the line using any
> font is going to be making a best guess.

Yes, good article, though I am still surprised this is not discussed
more often. Anyway, for psql, we assume a fixed width output device, so
we can just assume that for computation. You are right that Unicode
just doesn't seem to consider fixed width output cases and doesn't
provide much guidance.

Beyond psql, should we update our docs to say that character_length()
for Unicode returns the number of Unicode code points, and not
necessarily the number of displayed characters if grapheme clusters are
present?

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Embrace your flaws. They make you human, rather than perfect,
which you will never be.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-01-24 19:21:15 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Previous Message Jacob Champion 2023-01-24 19:18:44 Re: Non-superuser subscription owners