From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: broken table formatting in psql |
Date: | 2022-09-02 06:43:50 |
Message-ID: | CAFBsxsHU91b0FDevdO=JugYHMhBMym6k94aa-iqwtjPLFU5axA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Sep 2, 2022 at 12:17 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Thu, 01 Sep 2022 18:22:06 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> > At Thu, 1 Sep 2022 15:00:38 +0700, John Naylor <john(dot)naylor(at)enterprisedb(dot)com> wrote in
> > > UnicodeData.txt has this:
> > >
> > > 200B;ZERO WIDTH SPACE;Cf;0;BN;;;;;N;;;;;
> > > 200C;ZERO WIDTH NON-JOINER;Cf;0;BN;;;;;N;;;;;
> > > 200D;ZERO WIDTH JOINER;Cf;0;BN;;;;;N;;;;;
> > > 200E;LEFT-TO-RIGHT MARK;Cf;0;L;;;;;N;;;;;
> > > 200F;RIGHT-TO-LEFT MARK;Cf;0;R;;;;;N;;;;;
> > >
> > > So maybe we need to take Cf characters in this file into account, in
> > > addition to Me and Mn (combining characters).
> >
> > Including them into unicode_combining_table.h actually worked, but I'm
> > not sure it is valid to include Cf's among Mn/Me's..
Looking at the definition, Cf means "other, format" category, "Format
character that affects the layout of text or the operation of text
processes, but is not normally rendered". [1]
> UnicodeData.txt
> 174:00AD;SOFT HYPHEN;Cf;0;BN;;;;;N;;;;;
>
> Soft-hyphen seems like not zero-width.. usually...
I gather it only appears at line breaks, which I doubt we want to handle.
> 0600;ARABIC NUMBER SIGN;Cf;0;AN;;;;;N;;;;;
> 110BD;KAITHI NUMBER SIGN;Cf;0;L;;;;;N;;;;;
>
> Mmm. These looks like not zero-width?
There are glyphs, but there is something special about the first one:
select U&'\0600';
Looks like this in psql (substituting 'X' to avoid systemic differences):
+----------+
| ?column? |
+----------+
| X |
+----------+
(1 row)
Copy from psql to vim or nano:
+----------+
| ?column? |
+----------+
| X |
+----------+
(1 row)
...so it does mess up the border the same way. The second
(U&'\+0110bd') doesn't render for me.
> However, it seems like basically a win if we include "Cf"s to the
> "combining" table?
There seems to be a case for that. If we did include those, we should
rename the table to match.
I found this old document from 2002 on "default ignorable" characters
that normally have no visible glyph:
https://unicode.org/L2/L2002/02368-default-ignorable.html
If there is any doubt about including all of Cf, we could also just
add a branch in wchar.c to hard-code the 200B-200F range.
--
John Naylor
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2022-09-02 06:54:56 | Re: test_decoding assertion failure for the loss of top-sub transaction relationship |
Previous Message | Drouvot, Bertrand | 2022-09-02 06:33:32 | Re: Add tracking of backend memory allocated to pg_stat_activity |