From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Lee Hachadoorian <lee(dot)hachadoorian(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Client Encoding and Latin characters |
Date: | 2009-11-24 16:45:21 |
Message-ID: | 18502.1259081121@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Lee Hachadoorian <lee(dot)hachadoorian(at)gmail(dot)com> writes:
> My database is encoded UTF8. I recently was uploading (via COPY) some
> census data which included place names with , , , and other such
> characters. The upload choked on the Latin characters. Following the
> docs, I was able to fix this with:
> SET CLIENT_ENCODING TO 'LATIN1';
> COPY table FROM 'filename';
> After which I
> SET CLIENT_ENCODING TO 'UTF8';
> I typically use COPY FROM to bulk load data. My question is, is there
> any disadvantage to setting the default client_encoding as LATIN1? I
> expect to never be dealing with Asian languages, or most of the other
> LATINx languages. If I ever try to COPY FROM data incompatible with
> LATIN1, the command will just choke, and I can pick an appropriate
> encoding and try again, right?
Uh, no. You can pretty much assume that LATIN1 will take any random
byte string; likewise for any other single-byte encoding. UTF8 as a
default is a bit safer because it's significantly more likely that it
will be able to detect non-UTF8 input.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andrej | 2009-11-24 16:51:24 | Re: ora2pg and DBD::Pg |
Previous Message | Lee Hachadoorian | 2009-11-24 16:39:11 | Client Encoding and Latin characters |