From: | Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Smith <smithpb2250(at)gmail(dot)com> |
Cc: | Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com> |
Subject: | Re: [PATCH] Add pretty-printed XML output option |
Date: | 2023-03-10 13:30:07 |
Message-ID: | 906cfd1d-63a0-3cf5-7291-2dadb2e149ff@uni-muenster.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks a lot for the review!
On 09.03.23 21:21, Tom Lane wrote:
> I've looked through this now, and have some minor complaints and a major
> one. The major one is that it doesn't work for XML that doesn't satisfy
> IS DOCUMENT. For example,
>
> regression=# select '<bar><val x="y">42</val></bar><foo></foo>'::xml is document;
> ?column?
> ----------
> f
> (1 row)
>
> regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text);
> xmlserialize
> -------------------------------------------
> <bar><val x="y">42</val></bar><foo></foo>
> (1 row)
>
> regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text indent);
> ERROR: invalid XML document
> DETAIL: line 1: Extra content at the end of the document
> <bar><val x="y">42</val></bar><foo></foo>
> ^
I assumed it should fail because the XML string doesn't have a
singly-rooted XML. Oracle has this feature implemented and it does not
seem to allow non singly-rooted strings either[1]. Also, some the tools
I use also fail in this case[2,3]
How do you suggest the output should look like? Does the SQL spec also
define it? I can't find it online :(
> This is not what the documentation promises, and I don't think it's
> good enough --- the SQL spec has no restriction saying you can't
> use INDENT with CONTENT. I tried adjusting things so that we call
> xml_parse() with the appropriate DOCUMENT or CONTENT xmloption flag,
> but all that got me was empty output (except for a document header).
> It seems like xmlDocDumpFormatMemory is not the thing to use, at least
> not in the CONTENT case. But libxml2 has a few other "dump"
> functions, so maybe we can use a different one? I see we are using
> xmlNodeDump elsewhere, and that has a format option, so maybe there's
> a way forward there.
>
> A lesser issue is that INDENT tacks on a document header (XML declaration)
> whether there was one or not. I'm not sure whether that's an appropriate
> thing to do in the DOCUMENT case, but it sure seems weird in the CONTENT
> case. We have code that can strip off the header again, but we
> need to figure out exactly when to apply it.
I replaced xmlDocDumpFormatMemory with xmlSaveToBuffer and used to
option XML_SAVE_NO_DECL for input docs with XML declaration. It no
longer returns a XML declaration if the input doc does not contain it.
> I also suspect that it's outright broken to attach a header claiming
> the data is now in UTF8 encoding. If the database encoding isn't
> UTF8, then either that's a lie or we now have an encoding violation.
I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now
uses the encoding of the given doc and UTF8 if not provided.
> Another thing that's mildly irking me is that the current
> factorization of this code will result in xml_parse'ing the data
> twice, if you have both DOCUMENT and INDENT specified. We could
> consider avoiding that if we merged the indentation functionality
> into xmltotext_with_xmloption, but it's probably premature to do so
> when we haven't figured out how to get the output right --- we might
> end up needing two xml_parse calls anyway with different parameters,
> perhaps.
>
> I also had a bunch of cosmetic complaints (mostly around this having
> a bad case of add-at-the-end-itis), which I've cleaned up in the
> attached v20. This doesn't address any of the above, however.
I swear to god I have no idea what "add-at-the-end-itis" means :)
> regards, tom lane
Thanks a lot!
Best, Jim
1 - https://dbfiddle.uk/WUOWtjBU
Attachment | Content-Type | Size |
---|---|---|
v21-0001-Add-pretty-printed-XML-output-option.patch | text/x-patch | 29.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Önder Kalacı | 2023-03-10 14:28:33 | Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
Previous Message | Justin Pryzby | 2023-03-10 13:05:49 | Re: Add LZ4 compression in pg_dump |