From: | Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com> |
Subject: | Re: [PATCH] Add pretty-printed XML output option |
Date: | 2023-03-14 22:57:22 |
Message-ID: | abd25443-ef6d-7b8a-c593-a2a991d3e5ce@uni-muenster.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 14.03.23 18:40, Tom Lane wrote:
> Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> writes:
>> [ v22-0001-Add-pretty-printed-XML-output-option.patch ]
> I poked at this for awhile and ran into a problem that I'm not sure
> how to solve: it misbehaves for input with embedded DOCTYPE.
>
> regression=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' as text indent);
> xmlserialize
> --------------
> <!DOCTYPE a>+
> <a></a> +
>
> (1 row)
The issue was the flag XML_SAVE_NO_EMPTY. It was forcing empty elements
to be serialized with start-end tag pairs. Removing it did the trick ...
postgres=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' AS text INDENT);
xmlserialize
--------------
<!DOCTYPE a>+
<a/> +
(1 row)
... but as a side effect empty start-end tags will be now serialized as
empty elements
postgres=# SELECT xmlserialize(CONTENT '<foo><bar></bar></foo>' AS text
INDENT);
xmlserialize
--------------
<foo> +
<bar/> +
</foo>
(1 row)
It seems to be the standard behavior of other xml indent tools
(including Oracle)
> regression=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' as text indent);
> xmlserialize
> --------------
>
> (1 row)
>
> The bad result for CONTENT is because xml_parse() decides to
> parse_as_document, but xmlserialize_indent has no idea that happened
> and tries to use the content_nodes list anyway. I don't especially
> care for the laissez faire "maybe we'll set *content_nodes and maybe
> we won't" API you adopted for xml_parse, which seems to be contributing
> to the mess. We could pass back more info so that xmlserialize_indent
> knows what really happened.
I added a new (nullable) parameter to the xml_parse function that will
return the actual XmlOptionType used to parse the xml data. Now
xmlserialize_indent knows how the data was really parsed:
postgres=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' AS text INDENT);
xmlserialize
--------------
<!DOCTYPE a>+
<a/> +
(1 row)
I added test cases for these queries.
v23 attached.
Thanks!
Best, Jim
Attachment | Content-Type | Size |
---|---|---|
v23-0001-Add-pretty-printed-XML-output-option.patch | text/x-patch | 39.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-03-14 23:25:21 | Re: [PATCH] Add pretty-printed XML output option |
Previous Message | Peter Geoghegan | 2023-03-14 22:56:50 | Re: Add pg_walinspect function with block info columns |