From: | Roland Glenn McIntosh <roland(at)steeltorch(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | SOLUTION: Insert a Euro symbol as UTF-8 from a latin1 charset. |
Date: | 2003-06-13 15:28:36 |
Message-ID: | 5.1.0.14.2.20030613112410.05ef2260@lnxmain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
This is my solution / bug report / RFC cross-posted from [GENERAL] regarding insertion of hexadecimal characters from the command line.
-----------------------------------
Okay. I have NO IDEA why this works. If someone could enlighten me as to the math involved I'd appreciate it. First, a little background:
The Euro symbol is unicode value 0x20AC. UTF-8 encoding is a way of representing most unicode characters in two bytes, and most latin characters in one byte.
The only way I have found to insert a euro symbol into the database from the command line psql client is this:
INSERT INTO mytable VALUES('\342\202\254');
I don't know why this works. In hex, those octal values are:
E2 82 AC
I don't know why my "20" byte turned into two bytes of E2 and 82. Furthermore, I was under the impression that a UTF-8 encoding of the Euro sign only took two bytes. Corroborating this assumption, upon dumping that table with pg_dump and examining the resultant file in a hex editor, I see this in that character position: AC 20
Additionally, according to the psql online documentation and man page:
"Anything contained in single quotes is furthermore subject to C-like substitutions for \n (new line), \t (tab), \digits, \0digits, and \0xdigits (the character with the given decimal, octal, or hexadecimal code)."
Those digits *should* be interpreted as decimal digits, but they aren't. The man page for psql is either incorrect, or the implementation is buggy.
I did try the '\0x20AC' method, and '\0x20\0xAC' without success.
It's worth noting that the field I'm inserting into is an SQL_ASCII field, and I'm reading my UTF-8 string out of it like this, via JDBC:
String value = new String( resultset.getBytes(1), "UTF-8");
Can anyone help me make sense of this mumbo jumbo?
-Roland
From | Date | Subject | |
---|---|---|---|
Next Message | Darko Prenosil | 2003-06-13 15:43:19 | Fw: Compiling Win32 |
Previous Message | Diogo de Oliveira Biazus | 2003-06-13 15:20:27 | Re: [HACKERS] SAP and MySQL ... [and Benchmark] |