From: | Karl Denninger <karl(at)denninger(dot)net> |
---|---|
To: | Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Encoding change question... |
Date: | 2010-08-16 17:40:03 |
Message-ID: | 4C6977F3.8080607@denninger.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
So I have myself a nice pickle here.....
I've got a database which was originally created with SQL_ASCII for the
encoding (anything goes text fields)
Unfortunately, I have a bunch of data that was encoded in UTF-8 that's
in an RSS feed that I need to load into said database. iconv barfs all
over this file in an attempt to turn it into ISO-8859 (which is what the
web application currently attached to that database is emitting and
collecting.) It appears the problem is (mostly) things like the
stylized double-quotes.
So I figured I'd go the other way, and convert what I have now in the
tables into UTF8.
Well, except that doesn't work either.
ticker=# select convert_to(subject, 'utf8') from post where
ordinal='2098167';
convert_to
----------------------------------------------------------------
1%: the interest rate on IBM\222s most recent three-year bond.
(1 row)
\222 is the correct code point for the styled single apostrophe that is
in that place in ISO-8859-1 in the source. However, the UTF prefix is
missing, as are the other two code-point characters (that is, I got the
code point but not the other two bytes that should be in front of it.
And if I set the code page on the web site to UTF-8, and also set the
encoding on the SQL session to UTF-8 I don't get the three-byte code - I
just get the one byte. That's a bust.
There are TWO fields in this database that need converted. I presumed I
could do something like this:
# update post set field1 = convert_to(field1, 'utf8');
It runs to completion without complaint but produces the above. No good.
So.... is there a way to do this? I do NOT want to dump, iconv on the
dumped file, then reload the database if I can possibly avoid it. Yes,
I know that will work (I've tested it on my development box), but being
able to do this "hot" would be DRAMATICALLY preferred. If I do the
iconv route I am basically rebuilding the entire database with all
that's involved in doing so in terms of downtime and such. VERY
undesirable.
(Equally-valid would be a way to convert UTF-8 to ISO-8859-1 (Latin1),
thereby allowing me to convert the incoming data stream to what's
already in the system, but thus far I've found no joy on that at all.)
Ideas?
-- Karl
Attachment | Content-Type | Size |
---|---|---|
karl.vcf | text/x-vcard | 124 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter C. Lai | 2010-08-16 17:46:21 | Re: Encoding change question... |
Previous Message | Alban Hertroys | 2010-08-16 17:07:05 | Re: good exception handling archiecutre |