From: | Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | [PATCH] Add CANONICAL option to xmlserialize |
Date: | 2023-03-05 18:44:01 |
Message-ID: | 2f04079e-b237-1a2f-0682-9f60ca59805e@uni-muenster.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 27.02.23 14:16, I wrote:
> Hi,
>
> In order to compare pairs of XML documents for equivalence it is
> necessary to convert them first to their canonical form, as described
> at W3C Canonical XML 1.1.[1] This spec basically defines a standard
> physical representation of xml documents that have more then one
> possible representation, so that it is possible to compare them, e.g.
> forcing UTF-8 encoding, entity reference replacement, attributes
> normalization, etc.
>
> Although it is not part of the XML/SQL standard, it would be nice to
> have the option CANONICAL in xmlserialize. Additionally, we could also
> add the attribute WITH [NO] COMMENTS to keep or remove xml comments
> from the documents.
>
> Something like this:
>
> WITH t(col) AS (
> VALUES
> ('<?xml version="1.0" encoding="ISO-8859-1"?>
> <!DOCTYPE doc SYSTEM "doc.dtd" [
> <!ENTITY val "42">
> <!ATTLIST xyz attr CDATA "default">
> ]>
>
> <!-- ordering of attributes -->
> <foo ns:c = "3" ns:b = "2" ns:a = "1"
> xmlns:ns="http://postgresql.org">
>
> <!-- Normalization of whitespace in start and end tags -->
> <!-- Elimination of superfluous namespace declarations,
> as already declared in <foo> -->
> <bar xmlns:ns="http://postgresql.org" >&val;</bar >
>
> <!-- Empty element conversion to start-end tag pair -->
> <empty/>
>
> <!-- Effect of transcoding from a sample encoding to UTF-8 -->
> <iso8859>©</iso8859>
>
> <!-- Addition of default attribute -->
> <!-- Whitespace inside tag preserved -->
> <xyz> 321 </xyz>
> </foo>
> <!-- comment outside doc -->'::xml)
> )
> SELECT xmlserialize(DOCUMENT col AS text CANONICAL) FROM t;
> xmlserialize
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
> <foo xmlns:ns="http://postgresql.org" ns:a="1" ns:b="2"
> ns:c="3"><bar>42</bar><empty></empty><iso8859>©</iso8859><xyz
> attr="default"> 321 </xyz></foo>
> (1 row)
>
> -- using WITH COMMENTS
>
> WITH t(col) AS (
> VALUES
> (' <foo ns:c = "3" ns:b = "2" ns:a = "1"
> xmlns:ns="http://postgresql.org">
> <!-- very important comment -->
> <xyz> 321 </xyz>
> </foo>'::xml)
> )
> SELECT xmlserialize(DOCUMENT col AS text CANONICAL WITH COMMENTS) FROM t;
> xmlserialize
> ------------------------------------------------------------------------------------------------------------------------
>
> <foo xmlns:ns="http://postgresql.org" ns:a="1" ns:b="2" ns:c="3"><!--
> very important comment --><xyz> 321 </xyz></foo>
> (1 row)
>
>
> Another option would be to simply create a new function, e.g.
> xmlcanonical(doc xml, keep_comments boolean), but I'm not sure if this
> would be the right approach.
>
> Attached a very short draft. What do you think?
>
> Best, Jim
>
> 1- https://www.w3.org/TR/xml-c14n11/
The attached version includes documentation and tests to the patch.
I hope things are clearer now :)
Best, Jim
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Add-CANONICAL-format-to-xmlserialize.patch | text/x-patch | 31.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Melanie Plageman | 2023-03-05 20:26:12 | Re: Should vacuum process config file reload more often |
Previous Message | Ankit Kumar Pandey | 2023-03-05 18:17:57 | Re: [Question] Similar Cost but variable execution time in sort |