From: | Jeff Davis <jdavis(at)laika(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | strange encoding behavior |
Date: | 2006-10-20 17:47:54 |
Message-ID: | 1161366474.21046.27.camel@dogma.v10.wvs |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
The following encoding behavior seems strange to me (v8.1.4). I have
read the docs, but I am still confused.
I have a UTF8 encoded database. I can do
=> SELECT '\xb9'::text;
But that seems to be the only way to get an invalid utf8 byte sequence
into a text type.
Even if I do PQexecParams and send the data as binary format (and type
text), I get a conversion error.
If I send the invalid character in a raw PQexec query, I assume that
postgres tries to convert it to cstring first, causing the conversion
error. That means it's impossible to send any character that's an
invalid UTF8 sequence in a raw query (as a value, anyway), as far as I
can tell. What motivates this question is that I have an application
inserting these invalid characters (using them in the raw query), and I
am finding it difficult to migrate to the UTF8-encoded database.
It seems strange that it's possible to put invalid utf8 byte sequences
in a text field, but only by using the E''-style escape sequences. The
only way I have found to do it using PQexecParams with the binary data
is something like:
=> SELECT textin(byteaout($1)); -- $1 is binary format, type bytea
So, if I were to sum this up in a single question, why does cstring not
accept invalid utf8 sequences? And if it doesn't, why are they allowed
in any text type?
Regards,
Jeff Davis
PS: I posted a similar question yesterday that included a lot of useless
information. I'm not trying to repost, I'm trying to focus my question a
little more.
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2006-10-20 18:05:19 | Re: Upgrade 7.4 to 8.1 or 8.2? |
Previous Message | Jerry Sievers | 2006-10-20 17:24:44 | Re: hardware failure - data recovery |