Re: Doc: typo in config.sgml

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: nagata(at)sraoss(dot)co(dot)jp
Cc: bruce(at)momjian(dot)us, daniel(at)yesql(dot)se, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Doc: typo in config.sgml
Date: 2024-10-09 02:49:29
Message-ID: 20241009.114929.1396811718032371256.ishii@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Mon, 7 Oct 2024 15:45:54 -0400
> Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
>> On Mon, Sep 30, 2024 at 11:59:48AM +0200, Daniel Gustafsson wrote:
>> > > On 30 Sep 2024, at 11:03, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>> > >
>> > >>>> I think there's an unnecessary underscore in config.sgml.
>> > >
>> > > I was wrong. The particular byte sequences just looked an underscore
>> > > on my editor but the byte sequence is actually 0xc2a0, which must be a
>> > > "non breaking space" encoded in UTF-8. I guess someone mistakenly
>> > > insert a non breaking space while editing config.sgml.
>> >
>> > I wonder if it would be worth to add a check for this like we have to tabs?
>> > The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp
>> > (doing so made me realize we don't have an equivalent meson target).
>>
>> Can we check for any character outside the support range of SGML?
>
> What we can define the range of allowed characters range in SGML?
>
> We can detect non-ASCII characters by using regexp /\P{ascii}/ or /[^\x00-\x7f]/,
> but they are used in some places in charset.sgml and some names in release-*.sgml.

I failed to find any standard regarding what characters are allowed in
SGML/XML. Assuming that any valid Unicode characters are allowed in
our *sgml files, I am afraid the best we can do is grepping non-ASCII
characters against the files and checking the results by a visual
inspection. Besides nbsp, there are tons of confusing Unicode
characters out there. For example there are many "hyphen like
characters".

https://www.compart.com/en/unicode/category/Pd

If one of them is used in the sgml files, it may be possible that it
was accidentally inserted.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo Nagata 2024-10-09 03:03:14 Re: Set AUTOCOMMIT to on in script output by pg_dump
Previous Message Nathan Bossart 2024-10-09 02:36:03 Re: Popcount optimization using AVX512