Re: New Chinese FAQ

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Magnus Hagander <mha(at)sollentuna(dot)net>
Cc: Weiping <laser(at)qmail(dot)zhengmai(dot)net(dot)cn>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, PostgreSQL www <pgsql-www(at)postgresql(dot)org>
Subject: Re: New Chinese FAQ
Date: 2005-05-17 03:12:27
Message-ID: 200505170312.j4H3CRY25080@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

Magnus Hagander wrote:
> >> ok, I'll fix the html tag problem ASAP.
> >>
> >
> >I fixed the tag problem and it now verifies fine:
> >
> >
> >http://validator.w3.org/check?uri=http%3A%2F%2Fwwwmaster.postgr
> >esql.org%2Fdocs%2Ffaqs.FAQ_chinese.html&charset=gb2312+%28Chine
> se%2C+simplified%29
> >
> >The only problem reported is that it says the encoding is incorrect for
> >a large number of lines. The above encoding forces it to be
> >gb2312. If
> >I make it Unicode I get even more failures. However, I remember iconv
> >doing the conversion to UTF8 just fine, so maybe something is
> >wrong with
> >how we are validating it.
>
> The output should be UTF8, and it should autodetect it. The output from
> the *website* should *not* validate as gb2312, because it is no longer
> in that encoding.
>
> The reason that's the only error you get may be that it doesn't validate
> the document because of encoding errors. So this doesn't prove (or
> disprove for that matter) that the tags are fixed.
>

Yes, I was using the doctype of HTML 4.0 when I tested, but when it was
on the web site only then was XHTML Traditional tested.

> >Anyway, the HTML is OK so it seems we just have encoding issue now.
> >The current version in CVS is all fixed up so please submit updates
> >based on that version. Thanks.
>
> I'm sorry to say, but it's invalid characters in it again :-(
> On svr2:
> svr2# iconv -f gb2312 -t utf-8 FAQ_chinese.html >/dev/null
> iconv: FAQ_chinese.html: cannot convert
>
>
> On developer.pgadmin.org:
> mha(at)developer:~/ext/faqs$ iconv -f gb2312 -t utf-8 FAQ_chinese.html -o
> /dev/null
> iconv: illegal input sequence at position 8182
>
>
> Could it be cvs that messes the encoding up? Can you mail me the file as
> you see it before you commit and I can see if that makes a difference?
>

The problem is that the document is clearly not XHTML, but when I use
htmltidy -raw -asxhtml to convert it to XHTML, it somehow messes up the
encodings and then iconv fails. So, I either have to manually fix the
HTML file to be XHTML, or I have to figure out why htmltidy is changing
the encoded text even though I am using -raw.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Bruce Momjian 2005-05-17 04:19:44 Re: New Chinese FAQ
Previous Message Weiping 2005-05-17 01:51:17 Re: New Chinese FAQ