From: | Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> |
---|---|
To: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
Cc: | Chapman Flack <chap(at)anastigmatix(dot)net>, vignesh C <vignesh21(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [PATCH] Add CANONICAL option to xmlserialize |
Date: | 2024-08-26 09:32:02 |
Message-ID: | 33ed592d-079e-4536-b4d9-35303343cc1b@uni-muenster.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Pavel
On 25.08.24 20:57, Pavel Stehule wrote:
>
> There is unwanted white space in the patch
>
> -<-><--><-->xmlFreeDoc(doc);
> +<->else if (format == XMLSERIALIZE_CANONICAL || format ==
> XMLSERIALIZE_CANONICAL_WITH_NO_COMMENTS)
> + <>{
> +<-><-->xmlChar *xmlbuf = NULL;
> +<-><-->int nbytes;
> +<-><-->int
>
I missed that one. Just removed it, thanks!
> 1. the xml is serialized to UTF8 string every time, but when target
> type is varchar or text, then it should be every time encoded to
> database encoding. Is not possible to hold utf8 string in latin2
> database varchar.
I'm calling xml_parse using GetDatabaseEncoding(), so I thought I would
be on the safe side
if(format ==XMLSERIALIZE_CANONICAL ||format
==XMLSERIALIZE_CANONICAL_WITH_NO_COMMENTS)
doc =xml_parse(data, XMLOPTION_DOCUMENT, false,
GetDatabaseEncoding(), NULL, NULL, NULL);
... or you mean something else?
> 2. The proposed feature can increase some confusion in implementation
> of NO IDENT. I am not an expert on this area, so I checked other
> databases. DB2 does not have anything similar. But Oracle's "NO IDENT"
> clause is very similar to the proposed "CANONICAL". Unfortunately,
> there is different behaviour of NO IDENT - Oracle's really removes
> formatting, Postgres does nothing.
Coincidentally, the [NO] INDENT support for xmlserialize is an old patch
of mine.
It basically "does nothing" and prints the xml as is, e.g.
SELECT xmlserialize(DOCUMENT '<foo><bar><val z="1"
a="8"><![CDATA[0&1]]></val></bar></foo>' AS text INDENT);
xmlserialize
--------------------------------------------
<foo> +
<bar> +
<val z="1" a="8"><![CDATA[0&1]]></val>+
</bar> +
</foo> +
(1 row)
SELECT xmlserialize(DOCUMENT '<foo><bar><val z="1"
a="8"><![CDATA[0&1]]></val></bar></foo>' AS text NO INDENT);
xmlserialize
--------------------------------------------------------------
<foo><bar><val z="1" a="8"><![CDATA[0&1]]></val></bar></foo>
(1 row)
SELECT xmlserialize(DOCUMENT '<foo><bar><val z="1"
a="8"><![CDATA[0&1]]></val></bar></foo>' AS text);
xmlserialize
--------------------------------------------------------------
<foo><bar><val z="1" a="8"><![CDATA[0&1]]></val></bar></foo>
(1 row)
.. while CANONICAL converts the xml to its canonical form,[1,2] e.g.
sorting attributes and replacing CDATA strings by its value:
SELECT xmlserialize(DOCUMENT '<foo><bar><val z="1"
a="8"><![CDATA[0&1]]></val></bar></foo>' AS text CANONICAL);
xmlserialize
------------------------------------------------------
<foo><bar><val a="8" z="1">0&1</val></bar></foo>
(1 row)
xmlserialize CANONICAL does not exist in any other database and it's not
part of the SQL/XML standard.
Regarding the different behaviour of NO INDENT in Oracle and PostgreSQL:
it is not entirely clear to me if SQL/XML states that NO INDENT must
remove the indentation from xml strings.
It says:
"INDENT — the choice of whether to “pretty-print” the serialized XML by
means of indentation, either
True or False.
....
i) If <XML serialize indent> is specified and does not contain NO, then
let IND be True.
ii) Otherwise, let IND be False."
When I wrote the patch I assumed it meant to leave the xml as is .. but
I might be wrong.
Perhaps it would be best if we open a new thread for this topic.
Thank you for reviewing this patch. Much appreciated!
Best,
--
Jim
1 - https://www.w3.org/TR/xml-c14n11/
2 - https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-c14n.html
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2024-08-26 09:37:26 | Re: Doc: fix the note related to the GUC "synchronized_standby_slots" |
Previous Message | Amit Kapila | 2024-08-26 09:14:32 | Re: Conflict Detection and Resolution |