From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | radist-hack(at)yandex(dot)ru |
Cc: | PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Strange output of XML attribute values |
Date: | 2020-09-16 12:50:30 |
Message-ID: | CAFj8pRAg6xwEow=NcTFgtm35MrK-h35bxZhnxBXDdd3BYPGS6A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
st 16. 9. 2020 v 14:11 odesílatel Andrew Marynchuk (Андрей Маринчук) <
radist(dot)nt(at)gmail(dot)com> napsal:
> This problem is quite old, but it leads to the inability to use XML
> generation functions in PostgreSQL database for some cases, or at least
> requires to perform subsequent parsing and regenerating XML by an external
> utility. It reproduces in PostgreSQL 12.4, compiled by Visual C++ build
> 1914, 64-bit (windows 10), but I've seen the same problem in 9.6 build from
> CentOS yum package.
>
> *How to reproduce*:
> Just execute the query (actually the xmlelement call is enough to
> reproduce the proble):
> select xmlserialize(document xmlroot(xmlelement(name "ЭлементВКириллице",
> xmlattributes('ЗначениеВКириллице' as "АтрибутВКириллице"),
> 'ТекстВКириллице'), version '1.0', standalone yes) as text);
>
> *Expected result*:
> <?xml version="1.0" standalone="yes"?><ЭлементВКириллице
> АтрибутВКириллице="ЗначениеВКириллице">ТекстВКириллице</ЭлементВКириллице>
>
> *Actual result*:
> <?xml version="1.0" standalone="yes"?><ЭлементВКириллице
> АтрибутВКириллице="ЗначениеВКириллице">ТекстВКириллице</ЭлементВКириллице>
>
> This example uses cyrillic letters, but it could be any non-ASCII
> character.
> According to the discussion
> <https://www.sql.ru/forum/775061/russkiy-yazyk-v-xml?hl=libxml>, this
> problem arises because PostgreSQL does not provides libxml2 an information
> of document encoding due to the lack of xmlTextWriterStartDocument call, so
> libxml2 has no idea that encoding is UTF-8 and non-ASCII characters could
> be written without converting to &#x...;-sequences.
>
I don't think it is true. The url encoding is done only in attributes, and
only when the output encoding will be utf8. When you try to use 8bit
encoding with Azbuka support, it will be ok.
>
> In the modern world, UTF-8 encoding is used everywhere and such
> unnecessary character converting looks strange. Current workaround is
> passing generated content to the pl/python function which parses and writes
> back the xml (xml.dom.minidom.parseString(...).toxml()).
>
If I remember some discussion about this topic, the problem is XML
standard, that requires url encoding in attribute values.
I reported this issue 10 (maybe 15) years ago to libxml2 developers, and It
was rejected. Maybe libxml2 supports too old XML standards. I don't know -
this library is years in frozen state, but there is no replacement.
It is a libxml2 problem - , and there it should be reported and fixed. It
is not possible to fix this issue on Postgres' side.
Regards
Pavel
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2020-09-16 13:39:08 | BUG #16620: Autovacuum does not process certain databases after migration from postgresql 10 |
Previous Message | Andrew Marynchuk (Андрей Маринчук) | 2020-09-16 12:09:52 | Strange output of XML attribute values |