From: | Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> |
---|---|
To: | Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> |
Cc: | Tatsuo Ishii <ishii(at)postgresql(dot)org>, daniel(at)yesql(dot)se, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Doc: typo in config.sgml |
Date: | 2024-10-01 13:20:55 |
Message-ID: | 20241001222055.cbf86962216383e0476d41e1@sraoss.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 1 Oct 2024 15:16:52 +0900
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> wrote:
> On Tue, 01 Oct 2024 10:33:50 +0900 (JST)
> Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>
> > >> That's because non-breaking space (nbsp) is not encoded as 0xa0 in
> > >> UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
> > >> point in Unicode. i.e. U+00A0).
> > >> So grep -P "[\xC2\xA0]" should work to detect nbsp.
> > >
> > > `LC_ALL=C grep -P "\xC2\xA0"` works for my environment.
> > > ([ and ] were not necessary.)
> > >
> > > When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in charset.sgml,
> > > but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making sure detecting
> > > nbsp.
> > >
> > > One problem is that -P option can be used in only GNU grep, and grep in mac doesn't support it.
> > >
> > > On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can assume the shell is bash.
> > >
> > > Maybe, better way is use perl itself rather than grep as following.
> > >
> > > `perl -ne '/\xC2\xA0/ and print' `
> > >
> > > I attached a patch fixed in this way.
> >
> > GNU sed can also be used without setting LC_ALL:
> >
> > sed -n /"\xC2\xA0"/p
> >
> > However I am not sure if non-GNU sed can do this too...
>
> Although I've not check it myself, BSD sed doesn't support \x escape according to [1].
>
> [1] https://stackoverflow.com/questions/24275070/sed-not-giving-me-correct-substitute-operation-for-newline-with-mac-difference
>
> By the way, I've attached a patch a bit modified to use the plural form statement
> as same as check-tabs.
>
> Non-breaking **spaces** appear in SGML/XML files
The previous patch was broken because the perl command failed to return the correct result.
I've attached an updated patch to fix the return value. In passing, I added line breaks
for long lines.
Regards,
Yugo Nagata
--
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
Attachment | Content-Type | Size |
---|---|---|
v4_check_nbsp.diff | text/x-diff | 1.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Takeshi Ideriha | 2024-10-01 13:37:47 | Re: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values |
Previous Message | Jacob Champion | 2024-10-01 12:48:02 | Re: Row pattern recognition |