Quick Links

Re: Doc: typo in config.sgml

From:	Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Peter Eisentraut <peter(at)eisentraut(dot)org>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, tgl(at)sss(dot)pgh(dot)pa(dot)us, daniel(at)yesql(dot)se, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Doc: typo in config.sgml
Date:	2024-11-19 02:29:07
Message-ID:	20241119112907.1cdfe086df16e7ac173f9571@sraoss.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, 18 Nov 2024 16:04:20 -0500
Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> On Mon, Nov 11, 2024 at 10:02:15PM +0900, Yugo Nagata wrote:
> > On Tue, 5 Nov 2024 10:08:17 +0100
> > Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
> >
> >
> > > >> So you convert LATIN1 characters to HTML entities so that it's easier
> > > >> to detect non-LATIN1 characters is in the SGML docs? If my
> > > >> understanding is correct, it can be also achieved by using some tools
> > > >> like:
> > > >>
> > > >> iconv -t ISO-8859-1 -f UTF-8 release-17.sgml
> > > >>
> > > >> If there are some non-LATIN1 characters in release-17.sgml,
> > > >> it will complain like:
> > > >>
> > > >> iconv: illegal input sequence at position 175
> > > >>
> > > >> An advantage of this is, we don't need to covert each LATIN1
> > > >> characters to HTML entities and make the sgml file authors life a
> > > >> little bit easier.
> >
> > > I think the iconv approach is an idea worth checking out.
> > >
> > > It's also not necessarily true that the set of characters provided by
> > > the built-in PDF fonts is exactly the set of characters in Latin 1. It
> > > appears to be close enough, but I'm not sure, and I haven't found any
> > > authoritative information on that.
> >
> > I found a description in FAQ on Apache FOP [1] that explains some glyphs for
> > Latin1 character set are not contained in the standard text fonts.
> >
> > The standard text fonts supplied with Acrobat Reader have mostly glyphs for
> > characters from the ISO Latin 1 character set. For a variety of reasons, even
> > those are not completely guaranteed to work, for example you can't use the fi
> > ligature from the standard serif font.
>
> So, the failure of ligatures is caused usually by not using the right
> Adobe Font Metric (AFM) file, I think. I have seen faulty ligature
> rendering in PDFs but was alway able to fix it by using the right AFM
> file. Odds are, failure is caused by using a standard Latin1 AFM file
> and not the AFM file that matches the font being used.
>
> > [1] https://xmlgraphics.apache.org/fop/faq.html#pdf-characters
> >
> > However, it seems that using iconv to detect non-Latin1 characters may be still
> > useful because these are likely not displayed in PDF. For example, we can do this
> > in make check as the attached patch 0002. It cannot show the filname where one
> > is found, though.
>
> I was thinking something like:
>
> grep -l --recursive -P '[\x80-\xFF]' . |
> while read FILE
> do iconv -f UTF-8 -t ISO-8859-1 "$FILE" || exit 1
> done
>
> This only checks files with non-ASCII characters.

Checking non-latin1 after non-ASCII characters seems good idea.
I attached a updated patch (0002) that uses perl instead of grep
because non-GNU grep could not have escape sequences for hex.

>
> > > Another approach for a fix would be
> > > to get FOP produce the required warnings or errors more reliably. I
> > > know it has a bunch of logging settings (ultimately via log4j), so there
> > > might be some possibilities.
> >
> > When a character that cannot be displayed in PDF is found, a warning
> > "Glyph ... not available in font ...." is output in fop's log. We can
> > prevent such characters from being contained in PDF by checking
> > the message as the attached patch 0001. However, this is checked after
> > the pdf is generated since I could not have an idea how to terminate the
> > generation immediately when such character is detected.
>
> So, are we sure this will be the message even for non-English users? I
> thought checking for warning message text was too fragile.

I am not sure whether fop has messages in non-English, although I've never
seen Japanese messages output.

I wonder we can get unified results if executed with LANG=C.
The updated patch 0001 is fixed in this direction.

Regards,

--
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>

Attachment	Content-Type	Size
v2-0002-Check-non-latin1-characters-in-make-check.patch	text/x-diff	1.7 KB
v2-0001-Disallow-characters-that-cannot-be-displayed-in-P.patch	text/x-diff	922 bytes

In response to

Re: Doc: typo in config.sgml at 2024-11-18 21:04:20 from Bruce Momjian

Responses

Re: Doc: typo in config.sgml at 2024-11-19 03:07:40 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andy Fan	2024-11-19 02:57:17	Code cleanup for detoast a expanded datum.
Previous Message	Andy Fan	2024-11-19 02:19:04	Re: Document for wal_log_hints