BUG #4622: xpath only work in utf-8 server encoding

From: "Sergey Burladyan" <eshkinkot(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #4622: xpath only work in utf-8 server encoding
Date: 2009-01-22 13:39:00
Message-ID: 200901221339.n0MDd0dE033542@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 4622
Logged by: Sergey Burladyan
Email address: eshkinkot(at)gmail(dot)com
PostgreSQL version: 8.3.5
Operating system: Debian testing
Description: xpath only work in utf-8 server encoding
Details:

hello, all !

i am trying for test parse xml string in other than utf-8 encoding, it
correctly loaded but xpath(text, xml) can't handle it:

seb(at)seb:~/tmp/pg$ echo $LANG
ru_RU.CP1251
seb(at)seb:~/tmp/pg$ /usr/lib/postgresql/8.3/bin/postgres -p 5433 -k s -s -D .
LOG: система была отключена: 2009-01-22 16:30:07 MSK
LOG: autovacuum launcher started
LOG: database system is ready to accept connections

seb(at)seb:~$ echo $LANG
ru_RU.CP1251
seb(at)seb:~$ psql -h localhost -p 5433
Welcome to psql 8.3.5, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

seb=# select * from (select
xml('<русский>язык</русский>')) as x(v);
v
-------------------------
<русский>язык</русский>
(1 запись)

seb=# select xpath('/русский/text()', v::xml) from (select
xml('<русский>язык</русский>')) as x(v);
ERROR: could not parse XML data
DETAIL: Entity: line 1: parser error : Input is not proper UTF-8, indicate
encoding !
Bytes: 0xF0 0xF3 0xF1 0xF1
<x><русский>язык</русский></x>
^
seb=# select name, setting from pg_settings where name like 'lc_%' or name
like '%enco%';
name | setting
-----------------+--------------
client_encoding | WIN1251
lc_collate | ru_RU.CP1251
lc_ctype | ru_RU.CP1251
lc_messages | ru_RU.CP1251
lc_monetary | ru_RU.CP1251
lc_numeric | ru_RU.CP1251
lc_time | ru_RU.CP1251
server_encoding | WIN1251
(8 rows)

in utf-8 server encoding it work correctly:

seb=> select xpath('/русский/text()', v::xml) from (select
xml('<русский>язык</русский>')) as x(v);
xpath
--------
{язык}
(1 запись)

seb=> select name, setting from pg_settings where name like 'lc_%' or name
like '%enco%';
name | setting
-----------------+-------------
client_encoding | UTF8
lc_collate | ru_RU.UTF-8
lc_ctype | ru_RU.UTF-8
lc_messages | ru_RU.UTF-8
lc_monetary | ru_RU.UTF-8
lc_numeric | ru_RU.UTF-8
lc_time | ru_RU.UTF-8
server_encoding | UTF8
(8 rows)

i am think something is wrong here, string parsed correctly by xml(text),
but it result can't pass to xpath function...

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Eisentraut 2009-01-22 21:58:49 Re: BUG #4622: xpath only work in utf-8 server encoding
Previous Message Michael Meskes 2009-01-22 11:08:54 Re: segmentation fault on Dynamic query using C