From: | "hernan gonzalez" <hgonzalez(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | text and bytea |
Date: | 2008-02-21 16:34:15 |
Message-ID: | 48692c2d0802210834o68b3b705m3d3db620d5c9647d@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
(After dealing a while with this, and learning a little, I though of
post this as comment in the docs, but perhaps someone who knows better
can correct or clarify)
=====================================================================================================
The issues of charset encodings and the distinction between "text" and
"bytea" can led to some confusions. Say I have a database with UTF-8
default encoding.
db=# \encoding
UTF8
db=# create table chartest ( c text); -- just one text field
CREATE TABLE
db=# insert into chartest (c) values ('á'); -- just one non ascii
string: acute a
INSERT 0 1
db=# select c from chartest; -- to check our terminal works ok
c
---
á
(1 row)
db=# select ascii(c) from chartest; -- to check the "real" stored value
ascii
-------
225
(1 row)
OK, now let's try to use the to_ascii() function, useful for
accent-insentive queries...
db=# select to_ascii(c) from chartest;
ERROR: encoding conversion from UTF8 to ASCII not supported
That's right, the docs say that to_ascii() requires a string in LATIN9
encoding (and a few others)
How do we convert a string from UTF8 to LATIN9 ? Ah, here's is the
convert_to() function. Lets try:
db=# select to_ascii(convert_to(c,'LATIN9')) from chartest;
ERROR: function to_ascii(bytea) does not exist
LINE 1: select to_ascii(convert_to(c,'LATIN9')) from chartest;
^
HINT: No function matches the given name and argument types. You
might need to add explicit type casts.
Oops. convert_to() returns a bytea (byte array), to_char() expects a
text, not quite the same thing.
Should we try an explicit cast ?
db=# select to_ascii(convert_to(c,'LATIN9')::text) from chartest;
ERROR: encoding conversion from UTF8 to ASCII not supported
Mmmm.... does not work, the casting assumes the default encoding (UTF8).
How do we convince postgresql to interpret a byte array (mind it:
already representing a text codification)
as a text? Looking at the string related functions, it seems that
encode() might do the trick.
We try the 'escape' option (not quite satisfactory, we are not trying
to escape anything but...)
db=# select to_ascii(encode(convert_to(c,'LATIN9'),'escape')) from chartest;
ERROR: encoding conversion from UTF8 to ASCII not supported
Damn, it seems that the encoding worked but I need to convince
to_ascii() that the text it receives is not UTF8.
Lets see:
db=# select to_ascii(encode(convert_to(c,'LATIN9'),'escape'),'LATIN9')
from chartest;
to_ascii
----------
a
(1 row)
Wow! It worked!
I'm not sure if this is the "correct" way of doing it. If so, it does
not seems very satisfaying to me. Besides being a little too
convoluted, the distintion of "text" vs "bytea" seems half baked... I
would have expect one of these alternatives:
1) "bytea" is just a bunch of arbitrary bytes, a "text" is the
codification of a string in a fixed encoding (may be that of the
server installation, or the database one, or even a postgresql
internally choosen). In this case, "texts" only exist in this
encoding, the conversions to-from LATIN1, etc are only to-from bytes
arrays. This is how Java works (and is fairly clean for me).
2) "bytea" is just a bunch of arbitrary bytes, a "text" is the
codification of a string in a arbitrary encoding, and the chosen
encoding is part of the text content. That is, postgresql knows (at
'runtime') the encoding of the string.
Instead, in Postgresql implementation (the gurus might correct me) a
"text" very similar to a bytea, except for some restrictions (null
bytes and invalid sequences). This IMHO results in some confusion and
clumsiness in the conversions and text functions.
=============================================
Hernán
From | Date | Subject | |
---|---|---|---|
Next Message | Terry Lee Tucker | 2008-02-21 16:44:25 | Re: Disable Triggers |
Previous Message | A.M. | 2008-02-21 16:26:33 | Re: Disable Triggers |