Re: Character Encoding Question

From: Don Parris <parrisdc(at)gmail(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Character Encoding Question
Date: 2013-03-29 14:39:59
Message-ID: CAJ-7yom8TOnO3=87BJRPBCOiwBUY=0iiwkfBQZVJc36Se_Er2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

On Fri, Mar 29, 2013 at 5:35 AM, Daniele Varrazzo <
daniele(dot)varrazzo(at)gmail(dot)com> wrote:

> On Fri, Mar 29, 2013 at 2:01 AM, Don Parris <parrisdc(at)gmail(dot)com> wrote:
>
> > Aha! As it turns out, I started looking into the character set support
> in
> > the postgresql documentation, and discovered the psql -l command. It
> showed
> > this test database is actually *not* encoded in UTF-8 at all, but rather
> in
> > ASCII. I am not sure how I managed to do that, but I did. I was sure I
> had
> > used the same DB creation script and just changed the DB name, but
> clearly,
> > I missed something. I am not sure if it is necessary to drop and
> re-create
> > the database to correct this, but that is what I have done.
> >
> > When I tried using \encoding or SET client_encoding, I got no errors,
> but I
> > still saw this test DB set as ASCII when running the psql -l command.
> > Anyway, I'll have to pursue this further later. Many thanks for the
> help!
>
> In this case you should convert your database to utf8 (because it
> contains utf8 data) asap. SQL_ASCII actually doesn't mean ASCII but
> means store whatever octet you throw at it as it is, it's more akin to
> binary data (but without the possibility to store 0x00). From your
> examples, and with some luck, your database may contain utf8 only
> data, but if you connect with different clients or encodings and feed
> some latin1 etc. the database will be just happy to accept everything,
> no question asked; just, it will be a nightmare to read the data back
> or to make it uniform later.
>
> If you don't have familiarity with encodings and relative problems,
> the Spolsky article is a nice introduction
> <http://www.joelonsoftware.com/articles/Unicode.html>.
>
>
>
> Thanks Daniele,

I think I sent a follow-up post to this one saying that I have now
converted this db to UTF-8. I appreciate your help in tracking down what
the problem was, as well as the link to this article. Good reading for
sure. If I understand the article correctly, I can handle pretty much any
language - Korean, bulgarian, Arabic, etc... - by using the UTF-8
encoding. Is that correct?

Incidentally, my code actually broke on records that were only in English.
Or at least that is how it appears. The particular table I was searching
on contains no non-English letters. It probably will contain non-English
characters in the future, but does not now.

I am very interested in being able to support multiple languages, as my
wife and I speak Castillano (Peruvian flavored) and I speak a little German
and a few words in other languages. That's a topic for another day and
probably for another list, however. :-)

Again, many thanks to all of you for the help!

--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
http://dcparris.net/
<https://www.xing.com/profile/Don_Parris><http://www.linkedin.com/in/dcparris>
GPG Key ID: F5E179BE

In response to

Browse psycopg by date

  From Date Subject
Next Message Don Parris 2013-04-05 01:35:11 How to Handle ltree path Data Type
Previous Message Daniele Varrazzo 2013-03-29 09:35:37 Re: Character Encoding Question