From: | "Francisco Figueiredo Jr(dot)" <francisco(at)npgsql(dot)org> |
---|---|
To: | Andreas Kretschmer <akretschmer(at)spamfence(dot)net> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: [GENERAL] Re: [GENERAL] Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier... |
Date: | 2011-03-16 02:37:06 |
Message-ID: | AANLkTimCMgg=2oTjYw37Rc=WPHZv7MLYsCGg3Zhobo2D@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Now, I'm using my dev machine.
With the tests I'm doing, I can see the following:
If I use:
select 'seléct' as "seléct";
column name returns ok as expected.
If I do:
select 'seléct' as seléct;
This is the sequence of bytes I receive from postgresql:
byte1 - 115 UTF-8 for s
byte2 - 101 UTF-8 for e
byte3 - 108 UTF-8 for l
byte4 - 227
byte5 - 169
byte6 - 99 UTF-8 for c
byte7 - 116 UTF-8 for t
The problem lies in the byte4.
According to [1], the first byte defines how many bytes will compose
the UTF-8 char. the problem is that 227 encodes to a binary value of
1110 0011 and so, the UTF-8 decoder will think there are 3 bytes in
sequence when actually there are only 2! :( And this seems to be the
root of the problem for me.
For the select value the correct byte is returned:
byte1 - 115 UTF-8 for s
byte2 - 101 UTF-8 for e
byte3 - 108 UTF-8 for l
byte4 - 195
byte5 - 169
byte6 - 99 UTF-8 for c
byte7 - 116 UTF-8 for t
Where 195 is 1100 0011 which gives two bytes in sequence and the
decoder can decode this to the U+00E9 which is the char "é"
Do you think this can be related to my machine? I'm using OSX 10.6.6
and I compiled postgresql 9.0.1 from source code.
Thanks in advance.
[1] - http://en.wikipedia.org/wiki/UTF-8
On Tue, Mar 15, 2011 at 15:52, Francisco Figueiredo Jr.
<francisco(at)npgsql(dot)org> wrote:
> Hmmmmmmmm,
>
> What would change the encoding of the identifiers?
>
> Because on my dev machine which unfortunately isn't with me right now
> I can't get the identifier returned correctly :(
>
> I remember that it returns:
>
> test=*# select 'tést' as tést;
> tst
> ------
> tést
>
> Is there any config I can change at runtime in order to have it
> returned correctly?
>
> Thanks in advance.
>
>
> On Tue, Mar 15, 2011 at 15:45, Andreas Kretschmer
> <akretschmer(at)spamfence(dot)net> wrote:
>> Francisco Figueiredo Jr. <francisco(at)npgsql(dot)org> wrote:
>>
>>>
>>> What happens if you remove the double quotes in the column name identifier?
>>
>> the same:
>>
>> test=*# select 'tést' as tést;
>> tést
>> ------
>> tést
>> (1 Zeile)
>>
>>
>>
>> Andreas
>> --
>> Really, I'm not out to destroy Microsoft. That will just be a completely
>> unintentional side effect. (Linus Torvalds)
>> "If I was god, I would recompile penguin with --enable-fly." (unknown)
>> Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>
>
> --
> Regards,
>
> Francisco Figueiredo Jr.
> Npgsql Lead Developer
> http://www.npgsql.org
> http://fxjr.blogspot.com
> http://twitter.com/franciscojunior
>
--
Regards,
Francisco Figueiredo Jr.
Npgsql Lead Developer
http://www.npgsql.org
http://fxjr.blogspot.com
http://twitter.com/franciscojunior
From | Date | Subject | |
---|---|---|---|
Next Message | tushar nehete | 2011-03-16 05:25:23 | how to use savepoint and rollback in function |
Previous Message | Bill Thoen | 2011-03-16 00:36:40 | Re: Partitioned Database and Choosing Subtables |