From: | Rajesh Mallah <mallah_rajesh(at)yahoo(dot)com> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: Significance of Database Encoding |
Date: | 2005-05-16 02:16:50 |
Message-ID: | 20050516021650.78342.qmail@web31014.mail.mud.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
--- PFC <lists(at)boutiquenumerique(dot)com> wrote:
>
> > $ iconv -f US-ASCII -t UTF-8 < test.sql > out.sql
> > iconv: illegal input sequence at position 114500
> >
> > Any ideas how the job can be accomplised reliably.
> >
> > Also my database may contain data in multiple encodings
> > like WINDOWS-1251 and WINDOWS-1256 in various places
> > as data has been inserted by different peoples using
> > different sources and client software.
>
> You could use a simple program like that (in Python):
>
> output = open( "unidump", "w" )
> for line in open( "your dump" ):
> for encoding in "utf-8", "iso-8859-15", "whatever":
> try:
> output.write( unicode( line, encoding ).encode( "utf-8" ))
> break
> except UnicodeError:
> pass
> else:
> print "No suitable encoding for line..."
This may not work . Becuase ,conversion to utf-8 can be successfull (no runtime error)
even for an incorrect guess of the original encoding but the result will be an
incorrect utf8.
Regds
Rajesh Kumar Mallah
>
> I'd say this might work, if UTF-8 cannot absorb an apostrophe inside a
> multibit character. Can it ?
>
> Or you could do that to all your table using SELECTs but it's going to be
> painful...
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>
__________________________________
Do you Yahoo!?
Read only the mail you want - Yahoo! Mail SpamGuard.
http://promotions.yahoo.com/new_mail
From | Date | Subject | |
---|---|---|---|
Next Message | Ilya A. Kovalenko | 2005-05-16 11:39:40 | choosing index to use |
Previous Message | PFC | 2005-05-15 19:48:47 | Re: Significance of Database Encoding |