Re: encoding advice requested

From: Rick Schumeyer <rschumeyer(at)ieee(dot)org>
To: Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>
Cc: Daniel Verite *EXTERN* <daniel(at)manitou-mail(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: encoding advice requested
Date: 2006-11-13 13:39:24
Message-ID: 4558758C.3030705@ieee.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Albe Laurenz wrote:
>>> My database locale is en_US, and by default my databases are UTF8.
>>>
>>> My application code allows the user to paste text into a box and
>>>
> submit
>
>>> it to the database. Sometimes the pasted text contains non UTF8
>>> characters, typically the "fancy" forms of quotes and apostrophes.
>>>
> The
>
>>> database does not appreciate it when the application attempts to
>>>
> store
>
>>> these characters.
>>>
>>> What is the best option to deal with this problem?
>>>
>>> a) I think I could re-create the database with a LATIN1 encoding.
>>>
> I'm
>
>>> not real experienced with different encodings, are there any issues
>>>
> with
>
>>> combining en_US and LATIN1?
>>> b) I can issue a SET CLIENT_ENCODING TO 'LATIN1'; statement every
>>>
> time I
>
>>> open a connection. A brief test indicates this will work.
>>>
>> Be aware that "fancy" quotes and apostrophes are not representable in
>> LATIN1, the closest character set in which they are is probably
>> WIN1252. See http://en.wikipedia.org/wiki/Windows-1252, especially
>> characters in the 0x91-0x94 range.
>> Maybe your application implicitly uses this encoding, especially
>> if it runs under Windows, in which case the more appropriate
>> solution to your problem would be to set the client_encoding to
>> WIN1252 while keeping your database in UTF8.
>>
>
> This is good advice!
>
> To add an answer to your second question:
>
> You can
> ALTER ROLE username SET client_encoding = WIN1252
> to make this encoding the default for this user.
>
> If you want to change the setting for all users connecting
> to this database, you can also
> ALTER DATABASE mydb SET client_encoding = WIN1252
>
> Yours,
> Laurenz Albe
>
I will have to try the WIN1252 encoding.

On the client side, my application is a web browser. On the server
side, it is php scripts on a linux box. The data comes from copying
data from a browser window (pointing to another web site) and pasting it
into an html textarea, which is then submitted.

Given this, would you still suggest the WIN1252 encoding?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Shoaib Mir 2006-11-13 13:40:07 Re: SQL - update table problem...
Previous Message Richard Huxton 2006-11-13 13:38:43 Re: system tables...