Re: Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "Phoenix Kiula" <phoenix(dot)kiula(at)gmail(dot)com>
Cc: "Ivan Zolotukhin" <ivan(dot)zolotukhin(at)gmail(dot)com>, "Postgres General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Best practice for: ERROR: invalid byte sequence for encoding "UTF8"
Date: 2007-08-15 17:31:32
Message-ID: dcc563d10708151031j6b128165ic72256e99cf0c916@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 8/15/07, Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com> wrote:
> On 15/08/07, Ivan Zolotukhin <ivan(dot)zolotukhin(at)gmail(dot)com> wrote:
> > Hello,
> >
> > Actually I tried smth like $str = @iconv("UTF-8", "UTF-8//IGNORE",
> > $str); when preparing string for SQL query and it worked. There's
> > probably a better way in PHP to achieve this: simply change default
> > values in php.ini for these parameters:
> >
> > mbstring.encoding_translation = On
> > mbstring.substitute_character = none
> >
> > and broken symbols will be automatically stripped off from the input
> > and output.
>
>
> Sadly, they don't always do that, not with Asian scripts.
>
> And I do not completely agree, like the other poster suggested, with
> the concept of GIGO. Sometimes you want the end-user's experience to
> be seamless. For example, in one of our web sites, we allow users to
> submit text through a bookmarklet, where the title of the webpage
> comes in rawurlencoded format. We try to rawurldecode() it on our end
> but most of the times the Asian interpretation is wrong. We have all
> the usual mbstring settings in php.ini. In this scenario, the user did
> not enter any garbage. Our application should have the ability to
> recognize the text. We do what we can with mb_convert...etc, but the
> database just throws an error.
>
> PGSQL really needs to get with the program when it comes to utf-8 input.

What, exactly, does that mean?

That PostgreSQL should take things in invalid utf-8 format and just store them?
Or that PostgreSQL should autoconvert from invalid utf-8 to valid
utf-8, guessing the proper codes?

Seriously, what do you want pgsql to do with these invalid inputs?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Phoenix Kiula 2007-08-15 17:37:23 Re: Best practice for: ERROR: invalid byte sequence for encoding "UTF8"
Previous Message Naz Gassiep 2007-08-15 17:27:33 User-Friendly TimeZone List