From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | radist-hack(at)yandex(dot)ru |
Cc: | PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Strange output of XML attribute values |
Date: | 2020-09-16 13:40:20 |
Message-ID: | CAFj8pRAvUqUmGFjDYhNt32Udd87dL_0e2muerxwCFw9C08qx8w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
st 16. 9. 2020 v 14:50 odesílatel Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
napsal:
>
>
> st 16. 9. 2020 v 14:11 odesílatel Andrew Marynchuk (Андрей Маринчук) <
> radist(dot)nt(at)gmail(dot)com> napsal:
>
>> This problem is quite old, but it leads to the inability to use XML
>> generation functions in PostgreSQL database for some cases, or at least
>> requires to perform subsequent parsing and regenerating XML by an external
>> utility. It reproduces in PostgreSQL 12.4, compiled by Visual C++ build
>> 1914, 64-bit (windows 10), but I've seen the same problem in 9.6 build from
>> CentOS yum package.
>>
>> *How to reproduce*:
>> Just execute the query (actually the xmlelement call is enough to
>> reproduce the proble):
>> select xmlserialize(document xmlroot(xmlelement(name "ЭлементВКириллице",
>> xmlattributes('ЗначениеВКириллице' as "АтрибутВКириллице"),
>> 'ТекстВКириллице'), version '1.0', standalone yes) as text);
>>
>> *Expected result*:
>> <?xml version="1.0" standalone="yes"?><ЭлементВКириллице
>> АтрибутВКириллице="ЗначениеВКириллице">ТекстВКириллице</ЭлементВКириллице>
>>
>> *Actual result*:
>> <?xml version="1.0" standalone="yes"?><ЭлементВКириллице
>> АтрибутВКириллице="ЗначениеВКириллице">ТекстВКириллице</ЭлементВКириллице>
>>
>> This example uses cyrillic letters, but it could be any non-ASCII
>> character.
>> According to the discussion
>> <https://www.sql.ru/forum/775061/russkiy-yazyk-v-xml?hl=libxml>, this
>> problem arises because PostgreSQL does not provides libxml2 an information
>> of document encoding due to the lack of xmlTextWriterStartDocument call, so
>> libxml2 has no idea that encoding is UTF-8 and non-ASCII characters could
>> be written without converting to &#x...;-sequences.
>>
>
> I don't think it is true. The url encoding is done only in attributes, and
> only when the output encoding will be utf8. When you try to use 8bit
> encoding with Azbuka support, it will be ok.
>
>
>>
>> In the modern world, UTF-8 encoding is used everywhere and such
>> unnecessary character converting looks strange. Current workaround is
>> passing generated content to the pl/python function which parses and writes
>> back the xml (xml.dom.minidom.parseString(...).toxml()).
>>
>
> If I remember some discussion about this topic, the problem is XML
> standard, that requires url encoding in attribute values.
>
> I reported this issue 10 (maybe 15) years ago to libxml2 developers, and
> It was rejected. Maybe libxml2 supports too old XML standards. I don't know
> - this library is years in frozen state, but there is no replacement.
>
> It is a libxml2 problem - , and there it should be reported and fixed. It
> is not possible to fix this issue on Postgres' side.
>
xmlTextWriterWriteAttribute does this unwanted encoding,
xmlTextWriterWriteRaw doesn't do this
postgres=# select xmlelement(name "aho", xmlattributes('žlutý kůň' as x),
'žlutý kǔň');
NOTICE: >>>>>žlutý kůň
NOTICE: **** >>>>>žlutý kǔň
┌───────────────────────────────────────────────────────────┐
│ xmlelement │
╞═══════════════════════════════════════════════════════════╡
│ <aho x="žlutý kůň">žlutý kǔň</aho> │
└───────────────────────────────────────────────────────────┘
(1 row)
So somewhere there will be necessary information for understanding this
issue.
Regards
Pavel
>
> Regards
>
> Pavel
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Anastasia Lubennikova | 2020-09-16 14:24:00 | Re: BUG #16619: Amcheck detects corruption in hstore' btree index (ver 2) |
Previous Message | PG Bug reporting form | 2020-09-16 13:39:08 | BUG #16620: Autovacuum does not process certain databases after migration from postgresql 10 |