| From: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
|---|---|
| To: | Alan Hodgson <ahodgson(at)simkin(dot)ca> |
| Cc: | pgsql-general(at)postgresql(dot)org |
| Subject: | Re: invalid byte sequence for encoding "UTF8" |
| Date: | 2007-03-21 19:57:22 |
| Message-ID: | 20070321195722.GC13787@svana.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
On Wed, Mar 21, 2007 at 09:54:41AM -0700, Alan Hodgson wrote:
> iconv needs to read the whole file into RAM. What you can do is use the
> UNIX split utility to split the dump file into smaller segments, use iconv
> on each segment, and then cat all the converted segments back together into
> a new dump file. iconv is I think your best option for converting the dump
> to a valid encoding.
The guys at openstreetmap have written a UTF-8 cleaner that doesn't
read the whole file into memory:
http://trac.openstreetmap.org/browser/utils/planet.osm/C
Definitly more convenient for large files.
Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bill Eaton | 2007-03-21 20:04:50 | Re: best way to kill long running query? |
| Previous Message | Magnus Hagander | 2007-03-21 19:20:27 | Re: best way to kill long running query? |