From: | "Peter C(dot) Lai" <peter(at)simons-rock(dot)edu> |
---|---|
To: | Karl Denninger <karl(at)denninger(dot)net> |
Cc: | Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Encoding change question... |
Date: | 2010-08-16 17:46:21 |
Message-ID: | 20100816174620.GW35187@cesium.hyperfine.info |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
The doublequotes isn't UTF8 it's people copying and pasting from Microsoft
stuff, which is WIN-1252. So try to use that with iconv instead of utf8
On 2010-08-16 12:40:03PM -0500, Karl Denninger wrote:
> So I have myself a nice pickle here.....
>
> I've got a database which was originally created with SQL_ASCII for the
> encoding (anything goes text fields)
>
> Unfortunately, I have a bunch of data that was encoded in UTF-8 that's
> in an RSS feed that I need to load into said database. iconv barfs all
> over this file in an attempt to turn it into ISO-8859 (which is what the
> web application currently attached to that database is emitting and
> collecting.) It appears the problem is (mostly) things like the
> stylized double-quotes.
>
> So I figured I'd go the other way, and convert what I have now in the
> tables into UTF8.
>
> Well, except that doesn't work either.
>
> ticker=# select convert_to(subject, 'utf8') from post where
> ordinal='2098167';
> convert_to
> ----------------------------------------------------------------
> 1%: the interest rate on IBM\222s most recent three-year bond.
> (1 row)
>
> \222 is the correct code point for the styled single apostrophe that is
> in that place in ISO-8859-1 in the source. However, the UTF prefix is
> missing, as are the other two code-point characters (that is, I got the
> code point but not the other two bytes that should be in front of it.
> And if I set the code page on the web site to UTF-8, and also set the
> encoding on the SQL session to UTF-8 I don't get the three-byte code - I
> just get the one byte. That's a bust.
>
> There are TWO fields in this database that need converted. I presumed I
> could do something like this:
>
> # update post set field1 = convert_to(field1, 'utf8');
>
> It runs to completion without complaint but produces the above. No good.
>
> So.... is there a way to do this? I do NOT want to dump, iconv on the
> dumped file, then reload the database if I can possibly avoid it. Yes,
> I know that will work (I've tested it on my development box), but being
> able to do this "hot" would be DRAMATICALLY preferred. If I do the
> iconv route I am basically rebuilding the entire database with all
> that's involved in doing so in terms of downtime and such. VERY
> undesirable.
>
> (Equally-valid would be a way to convert UTF-8 to ISO-8859-1 (Latin1),
> thereby allowing me to convert the incoming data stream to what's
> already in the system, but thus far I've found no joy on that at all.)
>
> Ideas?
>
> -- Karl
>
> begin:vcard
> fn:Karl Denninger
> n:Denninger;Karl
> email;internet:karl(at)denninger(dot)net
> x-mozilla-html:TRUE
> version:2.1
> end:vcard
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
--
===========================================================
Peter C. Lai | Bard College at Simon's Rock
Systems Administrator | 84 Alford Rd.
Information Technology Svcs. | Gt. Barrington, MA 01230 USA
peter AT simons-rock.edu | (413) 528-7428
===========================================================
From | Date | Subject | |
---|---|---|---|
Next Message | Matthew Wilson | 2010-08-16 17:49:14 | Re: How to refer to computed columns from other computed columns? |
Previous Message | Karl Denninger | 2010-08-16 17:40:03 | Encoding change question... |