Re: Doc: typo in config.sgml

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: nagata(at)sraoss(dot)co(dot)jp, daniel(at)yesql(dot)se, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Doc: typo in config.sgml
Date: 2024-10-10 03:08:13
Message-ID: ZwdFHW11D9jv778P@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 9, 2024 at 11:49:29AM +0900, Tatsuo Ishii wrote:
> >> On Mon, Sep 30, 2024 at 11:59:48AM +0200, Daniel Gustafsson wrote:
> >> > > On 30 Sep 2024, at 11:03, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
> >> > >
> >> > >>>> I think there's an unnecessary underscore in config.sgml.
> >> > >
> >> > > I was wrong. The particular byte sequences just looked an underscore
> >> > > on my editor but the byte sequence is actually 0xc2a0, which must be a
> >> > > "non breaking space" encoded in UTF-8. I guess someone mistakenly
> >> > > insert a non breaking space while editing config.sgml.
> >> >
> >> > I wonder if it would be worth to add a check for this like we have to tabs?
> >> > The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp
> >> > (doing so made me realize we don't have an equivalent meson target).
> >>
> >> Can we check for any character outside the support range of SGML?
> >
> > What we can define the range of allowed characters range in SGML?
> >
> > We can detect non-ASCII characters by using regexp /\P{ascii}/ or /[^\x00-\x7f]/,
> > but they are used in some places in charset.sgml and some names in release-*.sgml.
>
> I failed to find any standard regarding what characters are allowed in
> SGML/XML. Assuming that any valid Unicode characters are allowed in
> our *sgml files, I am afraid the best we can do is grepping non-ASCII
> characters against the files and checking the results by a visual
> inspection. Besides nbsp, there are tons of confusing Unicode
> characters out there. For example there are many "hyphen like
> characters".
>
> https://www.compart.com/en/unicode/category/Pd
>
> If one of them is used in the sgml files, it may be possible that it
> was accidentally inserted.

Can we use Unicode in the SGML files?

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

When a patient asks the doctor, "Am I going to die?", he means
"Am I going to die soon?"

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2024-10-10 03:15:17 Re: Allow default \watch interval in psql to be configured
Previous Message Bruce Momjian 2024-10-10 03:06:09 Re: First draft of PG 17 release notes