Re: Doc: typo in config.sgml

From: Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, tgl(at)sss(dot)pgh(dot)pa(dot)us, daniel(at)yesql(dot)se, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Doc: typo in config.sgml
Date: 2024-11-11 13:02:15
Message-ID: 20241111220215.8aef4df0f8d541df61abad57@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 5 Nov 2024 10:08:17 +0100
Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:

> >> So you convert LATIN1 characters to HTML entities so that it's easier
> >> to detect non-LATIN1 characters is in the SGML docs? If my
> >> understanding is correct, it can be also achieved by using some tools
> >> like:
> >>
> >> iconv -t ISO-8859-1 -f UTF-8 release-17.sgml
> >>
> >> If there are some non-LATIN1 characters in release-17.sgml,
> >> it will complain like:
> >>
> >> iconv: illegal input sequence at position 175
> >>
> >> An advantage of this is, we don't need to covert each LATIN1
> >> characters to HTML entities and make the sgml file authors life a
> >> little bit easier.

> I think the iconv approach is an idea worth checking out.
>
> It's also not necessarily true that the set of characters provided by
> the built-in PDF fonts is exactly the set of characters in Latin 1. It
> appears to be close enough, but I'm not sure, and I haven't found any
> authoritative information on that.

I found a description in FAQ on Apache FOP [1] that explains some glyphs for
Latin1 character set are not contained in the standard text fonts.

The standard text fonts supplied with Acrobat Reader have mostly glyphs for
characters from the ISO Latin 1 character set. For a variety of reasons, even
those are not completely guaranteed to work, for example you can't use the fi
ligature from the standard serif font.

[1] https://xmlgraphics.apache.org/fop/faq.html#pdf-characters

However, it seems that using iconv to detect non-Latin1 characters may be still
useful because these are likely not displayed in PDF. For example, we can do this
in make check as the attached patch 0002. It cannot show the filname where one
is found, though.

> Another approach for a fix would be
> to get FOP produce the required warnings or errors more reliably. I
> know it has a bunch of logging settings (ultimately via log4j), so there
> might be some possibilities.

When a character that cannot be displayed in PDF is found, a warning
"Glyph ... not available in font ...." is output in fop's log. We can
prevent such characters from being contained in PDF by checking
the message as the attached patch 0001. However, this is checked after
the pdf is generated since I could not have an idea how to terminate the
generation immediately when such character is detected.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>

Attachment Content-Type Size
0002-Check-non-latin1-characters-in-make-check.patch text/x-diff 1.6 KB
0001-Disallow-characters-that-cannot-be-displayed-in-PDF.patch text/x-diff 910 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2024-11-11 13:08:43 RE: Parallel heap vacuum
Previous Message torikoshia 2024-11-11 12:45:39 Re: Add reject_limit option to file_fdw