From: | Jean-Christian Imbeault <jc(at)mega-bucks(dot)co(dot)jp> |
---|---|
To: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: Invalid EUC_JP char seq bug? |
Date: | 2003-07-02 02:42:30 |
Message-ID: | 3F024696.3080305@mega-bucks.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Tatsuo Ishii wrote:
>
> Since you did not show us exact query you send to PostgreSQL
I can't show the exact query because it is generated by PHP. I can
however show you the code that generates the query:
$words = $_GET["words"];
$sql = "select id from products where name like '$words'";
$conn = pg_connect("host=$DB_IP port=5432 dbname=$DB_NAME user=postgres");
$res = pg_query($conn, $sql);
The GET query string was:
words=%8f%ac%90%ec%96%be%93%fa%8d%81
I think that PHP does some internal translation of this before passing
it on though.
> I assume the query passed to PostgreSQL is:
>
> select id from products where name like 'string';
Yes.
> where string is "0x8fac90ec96be93fa8d81".
That I don't know.
> If the string is supposed to be an EUC_JP, it would be parsed as follows:
>
> 8f: single shift 3 (indicates that following 2 bytes are a JIS 0212 character
[snip ...]
Ah ... so it is not an EUC-JP string but an SJIS string. Postgres was
right. That answers my question. Thanks!
>>PS I have also had the error pop up with this string:
>>
>>search_words=%B7%F6%BA%7E
>>select id from products where name like '??~'
>>Query failed: ERROR: Invalid EUC_JP character sequence found (0xba7e)
>
>
> This is definitly a bad EUC_JP.
According to a PHP developer in my bug report
(http://bugs.php.net/bug.php?id=24309&edit=2)
"URL decoded byte sequance of 'search_words=%B7%F6%BA%7E' is
B7E6+BA7E, which is correct EUC-JP character sequence. [snip] But, I
believe encoding detection of mbstring works fine in this case.
B7E6+BA7E is not correct byte sequence of SJIS, UTF-8, ISO2022-JP. It is
correct EUC-JP byte sequence."
I see that he wrote B7E6 instead of the correct B7F6. I resubmitted my
bug report to PHP and pointed this out. Hopefully the developer will see
that this sequence is incorrect EUC-JP and that PHP failed to detect this :)
I *knew* there was nothing wrong with Postgres ;)
Thanks!
Jean-Christian Imbeault
PS I posted to HACKERS a few weeks ago about another bug (a real one :)
in the EUC-JP translation having to do with the WAVE DASH. I'll repost
here on the BUGS list, could you let me know the status of that BUG? Thanks!
From | Date | Subject | |
---|---|---|---|
Next Message | Jean-Christian Imbeault | 2003-07-02 02:45:56 | Bug in japanese charset mappings? |
Previous Message | Tatsuo Ishii | 2003-07-02 02:20:17 | Re: Invalid EUC_JP char seq bug? |