From: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | Magnus Hagander <mha(at)sollentuna(dot)net> |
Cc: | Weiping <laser(at)qmail(dot)zhengmai(dot)net(dot)cn>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, PostgreSQL www <pgsql-www(at)postgresql(dot)org> |
Subject: | Re: New Chinese FAQ |
Date: | 2005-05-17 03:12:27 |
Message-ID: | 200505170312.j4H3CRY25080@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-www |
Magnus Hagander wrote:
> >> ok, I'll fix the html tag problem ASAP.
> >>
> >
> >I fixed the tag problem and it now verifies fine:
> >
> >
> >http://validator.w3.org/check?uri=http%3A%2F%2Fwwwmaster.postgr
> >esql.org%2Fdocs%2Ffaqs.FAQ_chinese.html&charset=gb2312+%28Chine
> se%2C+simplified%29
> >
> >The only problem reported is that it says the encoding is incorrect for
> >a large number of lines. The above encoding forces it to be
> >gb2312. If
> >I make it Unicode I get even more failures. However, I remember iconv
> >doing the conversion to UTF8 just fine, so maybe something is
> >wrong with
> >how we are validating it.
>
> The output should be UTF8, and it should autodetect it. The output from
> the *website* should *not* validate as gb2312, because it is no longer
> in that encoding.
>
> The reason that's the only error you get may be that it doesn't validate
> the document because of encoding errors. So this doesn't prove (or
> disprove for that matter) that the tags are fixed.
>
Yes, I was using the doctype of HTML 4.0 when I tested, but when it was
on the web site only then was XHTML Traditional tested.
> >Anyway, the HTML is OK so it seems we just have encoding issue now.
> >The current version in CVS is all fixed up so please submit updates
> >based on that version. Thanks.
>
> I'm sorry to say, but it's invalid characters in it again :-(
> On svr2:
> svr2# iconv -f gb2312 -t utf-8 FAQ_chinese.html >/dev/null
> iconv: FAQ_chinese.html: cannot convert
>
>
> On developer.pgadmin.org:
> mha(at)developer:~/ext/faqs$ iconv -f gb2312 -t utf-8 FAQ_chinese.html -o
> /dev/null
> iconv: illegal input sequence at position 8182
>
>
> Could it be cvs that messes the encoding up? Can you mail me the file as
> you see it before you commit and I can see if that makes a difference?
>
The problem is that the document is clearly not XHTML, but when I use
htmltidy -raw -asxhtml to convert it to XHTML, it somehow messes up the
encodings and then iconv fails. So, I either have to manually fix the
HTML file to be XHTML, or I have to figure out why htmltidy is changing
the encoded text even though I am using -raw.
--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2005-05-17 04:19:44 | Re: New Chinese FAQ |
Previous Message | Weiping | 2005-05-17 01:51:17 | Re: New Chinese FAQ |