From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | Joel Fradkin <jfradkin(at)wazagua(dot)com>, "'Salem Berhanu'" <salemb4(at)hotmail(dot)com>, pgsql-admin(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org |
Subject: | Re: [GENERAL] postgres & server encodings |
Date: | 2005-08-09 17:31:03 |
Message-ID: | 17412.1123608663@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin pgsql-general |
Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> The problem only shows up when you have mixed data -- say, you have two
> applications, one website in PHP which inserts data in Latin-1, and a
> Windows app which inserts in UTF-8. In this case your data will be a
> mess to fix, and there's no way a single conversion will get it right.
> You will have to manually separate the parts that are UTF8 from the
> Latin1, and import them separately. Not a position I'd like to be in.
The only helpful tip I can think of is that you can try to import data
into a UTF8 database and see if it gets rejected as badly encoded; this
will at least give you a weak tool to separate what's what.
I'm afraid the reverse direction won't help much --- in single-byte
encodings such as Latin1 there are no encoding errors, and so you can't
do any simple filtering to check in that direction. In the end you're
going to have to eyeball a lot of data for plausibility :-(
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Fuhr | 2005-08-09 18:34:43 | Re: PG 7.3.4 VS PG 8.0.3 Problem |
Previous Message | Chris Hoover | 2005-08-09 17:27:55 | PG 7.3.4 VS PG 8.0.3 Problem |
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2005-08-09 18:38:43 | Re: [GENERAL] postgres & server encodings |
Previous Message | Alvaro Herrera | 2005-08-09 17:23:05 | Re: Poll on your LAPP Preferences |