Quick Links

Re: Doc: typo in config.sgml

From:	Peter Eisentraut <peter(at)eisentraut(dot)org>
To:	Bruce Momjian <bruce(at)momjian(dot)us>, Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, nagata(at)sraoss(dot)co(dot)jp, daniel(at)yesql(dot)se, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Doc: typo in config.sgml
Date:	2024-11-05 09:08:17
Message-ID:	7491a14c-2215-46f0-87fe-ce30ae9eb4f6@eisentraut.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 02.11.24 14:18, Bruce Momjian wrote:
> On Sat, Nov 2, 2024 at 12:02:12PM +0900, Tatsuo Ishii wrote:
>>> Yes, we _allow_ LATIN1 characters in the SGML docs, but I replaced the
>>> LATIN1 characters we had with HTML entities, so there are none
>>> currently.
>>>
>>> I think it is too easy for non-Latin1 UTF8 to creep into our SGML docs
>>> so I added a cron job on my server to alert me when non-ASCII characters
>>> appear.
>>
>> So you convert LATIN1 characters to HTML entities so that it's easier
>> to detect non-LATIN1 characters is in the SGML docs? If my
>> understanding is correct, it can be also achieved by using some tools
>> like:
>>
>> iconv -t ISO-8859-1 -f UTF-8 release-17.sgml
>>
>> If there are some non-LATIN1 characters in release-17.sgml,
>> it will complain like:
>>
>> iconv: illegal input sequence at position 175
>>
>> An advantage of this is, we don't need to covert each LATIN1
>> characters to HTML entities and make the sgml file authors life a
>> little bit easier.
>
> I might have misread the feedback. I know people didn't want a Makfile
> rule to prevent it, but I though converting few UTF8's we had was
> acceptable. Let me think some more and come up with a patch.

The question of encoding characters as entities is orthogonal to the
issue of only allowing Unicode characters that have a mapping to Latin
1. This patch seems to confuse these two issues, and I don't think it
actually fixed the second one, which is the one that was complained
about. I don't think anyone actually complained about the first one,
which is the one that was actually patched.

I think the iconv approach is an idea worth checking out.

It's also not necessarily true that the set of characters provided by
the built-in PDF fonts is exactly the set of characters in Latin 1. It
appears to be close enough, but I'm not sure, and I haven't found any
authoritative information on that. Another approach for a fix would be
to get FOP produce the required warnings or errors more reliably. I
know it has a bunch of logging settings (ultimately via log4j), so there
might be some possibilities.

In response to

Re: Doc: typo in config.sgml at 2024-11-02 13:18:39 from Bruce Momjian

Responses

Re: Doc: typo in config.sgml at 2024-11-11 13:02:15 from Yugo Nagata
Re: Doc: typo in config.sgml at 2024-12-03 02:28:02 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2024-11-05 09:12:20	Re: pg_dump --no-comments confusion
Previous Message	Sutou Kouhei	2024-11-05 08:43:28	Re: Make COPY format extendable: Extract COPY TO format implementations