Re: broken table formatting in psql

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: john(dot)naylor(at)enterprisedb(dot)com
Cc: pavel(dot)stehule(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: broken table formatting in psql
Date: 2022-09-02 08:19:42
Message-ID: 20220902.171942.797882791141403089.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 2 Sep 2022 13:43:50 +0700, John Naylor <john(dot)naylor(at)enterprisedb(dot)com> wrote in
> On Fri, Sep 2, 2022 at 12:17 PM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > > Including them into unicode_combining_table.h actually worked, but I'm
> > > not sure it is valid to include Cf's among Mn/Me's..
>
> Looking at the definition, Cf means "other, format" category, "Format
> character that affects the layout of text or the operation of text
> processes, but is not normally rendered". [1]
>
> > UnicodeData.txt
> > 174:00AD;SOFT HYPHEN;Cf;0;BN;;;;;N;;;;;
> >
> > Soft-hyphen seems like not zero-width.. usually...
>
> I gather it only appears at line breaks, which I doubt we want to handle.

Yeah. Sounds reasonable. (Emacs always renders it, though..)

> > 0600;ARABIC NUMBER SIGN;Cf;0;AN;;;;;N;;;;;
> > 110BD;KAITHI NUMBER SIGN;Cf;0;L;;;;;N;;;;;
> >
> > Mmm. These looks like not zero-width?
>
> There are glyphs, but there is something special about the first one:
>
> select U&'\0600';
>
> Looks like this in psql (substituting 'X' to avoid systemic differences):
>
> +----------+
> | ?column? |
> +----------+
> | X |
> +----------+
> (1 row)
>
> Copy from psql to vim or nano:
>
> +----------+
> | ?column? |
> +----------+
> | X |
> +----------+
> (1 row)
>
> ...so it does mess up the border the same way. The second
> (U&'\+0110bd') doesn't render for me.

Anyway it is inevitably rendering-environment dependent.

> > However, it seems like basically a win if we include "Cf"s to the
> > "combining" table?
>
> There seems to be a case for that. If we did include those, we should
> rename the table to match.

Agreed:)

> I found this old document from 2002 on "default ignorable" characters
> that normally have no visible glyph:
>
> https://unicode.org/L2/L2002/02368-default-ignorable.html

Mmm. Too old?

> If there is any doubt about including all of Cf, we could also just
> add a branch in wchar.c to hard-code the 200B-200F range.

If every way has defect to the similar extent, I think we will choose
to use authoritative data at least for the first step. We might want
to have additional filtering on it but it would be another issue,
maybe.

Attached is the first cut of that. (The commit messages is not great,
though.)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v1-0001-Treat-Unicode-characters-of-category-Format-as-no.patch text/x-patch 8.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2022-09-02 08:21:20 Re: Handle infinite recursion in logical replication setup
Previous Message Michael Paquier 2022-09-02 08:14:29 Re: [Commitfest 2022-09] Begins This Thursday