Re: Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
Cc: Ben <bench(at)silentmedia(dot)com>, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Ivan Zolotukhin <ivan(dot)zolotukhin(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Best practice for: ERROR: invalid byte sequence for encoding "UTF8"
Date: 2007-08-15 18:55:14
Message-ID: 20070815185514.GC28485@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Aug 16, 2007 at 01:56:52AM +0800, Phoenix Kiula wrote:
> This is very useful, thanks. This would be "bytea"? Quick questions:
>
> 1. Even if it were bytea, would it work with regular SQL operators
> such as regexp and LIKE?

bytea is specifically designed for binary data, as such it has all
sorts of quoting rules for dealing with embedded nulls and such. It's
not quite a drop in replacement.

The earlier suggestion of SQL_ASCII is probably closer to what you
want. It does to regexes and LIKE, however postgres will treat all your
data as bytes. If you want you regexes to match Unicode character
classes that's too bad; you can't have it both ways. Sorting it goes in
byte order, you don't have a lot of choice there either.

> 2. Would tsearch2 work with bytea in the future as long as the stuff
> in it was text?

Doubt it, SQL_ASCII would work though.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2007-08-15 19:06:22 Re: MVCC cons
Previous Message Jeff Davis 2007-08-15 18:45:32 Re: MVCC cons