From: | Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> |
---|---|
To: | "'Ken Tanzer *EXTERN*'" <ken(dot)tanzer(at)gmail(dot)com>, PG-General Mailing List <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Postgres, apps, special characters and UTF-8 encoding |
Date: | 2017-03-08 09:47:00 |
Message-ID: | A737B7A37273E048B164557ADEF4A58B53A0696B@ntex2010i.host.magwien.gv.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Ken Tanzer wrote:
> Hi. I've got a recurring problem with character encoding for a Postgres-based web PHP app, and am
> hoping someone can clue me in or at least point me in the right direction. I'll confess upfront my
> understanding of encoding issues is extremely limited. Here goes.
>
> The app uses a Postgres database, UTF-8 encoded. Through their browsers, users can add and edit
> records often including text. Most of the time this works fine. Though sometimes this will fail with
> Postgres complaining, for example, "Could query with ... , The error text was: ERROR: invalid byte
> sequence for encoding "UTF8": 0xe9 0x20 0x67"
>
> So this generally happens when people copy and paste things out of their word documents and such.
>
> As I understand it, those are likely encoded in something non-UTF-8, like WIN-1251 or something. And
> that one way or another, the encoding needs to be translated before it can be placed into the
> database. I'm not clear how this is supposed to happen though. Automatically by the browser? Done
> in the app? Some other way? And if in the app, how is one supposed to know what the incoming
> encoding is?
>
> Thanks in advance for any help or pointers.
The byte sequence 0xe9 0x20 0x67 means "é g" in ISO-8859-1 and WINDOWS-1252,
so I think that your setup is as follows:
- The PHP application gets data encoded in ISO-8859-1 or WINDOWS-1252
and tries to store it in a database.
- The PHP application has a database connection with client_encoding
set to UTF8.
Then the database thinks it gets UTF-8 and will choke if it gets something
different.
The solution:
- Make sure that your web application gets data in only one encoding.
- Set client_encoding to that encoding.
Yours,
Laurenz Albe
From | Date | Subject | |
---|---|---|---|
Next Message | hariprasath nallasamy | 2017-03-08 10:04:56 | too may LWLocks |
Previous Message | Albe Laurenz | 2017-03-08 09:24:28 | Re: Request to confirm which command is use for exclusive operation |