From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> |
Cc: | Chapman Flack <chap(at)anastigmatix(dot)net>, vignesh C <vignesh21(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org> |
Subject: | Re: [PATCH] Add CANONICAL option to xmlserialize |
Date: | 2024-08-26 12:15:56 |
Message-ID: | CAFj8pRAjy1ahmKNUQYYegj2_gsfEhy75hEN04-WurxanXec85g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
po 26. 8. 2024 v 13:28 odesílatel Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
napsal:
>
>
> On 26.08.24 12:30, Pavel Stehule wrote:
> > I think so there should be specified the target of CANONICAL - it is a
> > partial replacement of NO INDENT or it produces format just for
> > comparing? The CANONICAL format is not probably extra standardized,
> > because libxml2 removes indenting, but examples in
> > https://www.w3.org/TR/xml-c14n11/ doesn't do it. So this format makes
> > sense just for local operations.
> My idea with CANONICAL was not to replace NO INDENT. The intent was to
> format xml strings in an standardized way, so that they can be compared.
> For instance, removing comments, sorting attributes, converting CDATA
> strings, converting empty elements to start-end tag pairs, removing
> white spaces between elements, etc ...
>
> The W3C recommendation for Canonical XML[1] dictates the following
> regarding the removal of whitespaces between elements :
>
> * Whitespace outside of the document element and within start and end
> tags is normalized
> * All whitespace in character content is retained (excluding characters
> removed during line feed normalization)
>
> >
> > I like this functionality, and it is great so the functionality from
> > libxml2 can be used, but I think, so the fact that there are four not
> > compatible implementations of xmlserialize is messy. Can be nice, if
> > we find some intersection between SQL/XML, Oracle instead of new
> > proprietary syntax.
> >
> > In Oracle syntax the CANONICAL is +/- NO INDENT SHOW DEFAULT ?
>
> No.
> XMLSERIALIZE ... NO INDENT is supposed, as the name suggests, to
> serialize an xml string without indenting it. One could argue that not
> indenting can be translated as removing indentation, but I couldn't find
> anything concrete about this in the SQL/XML spec. If it's indeed the
> case, we should correct XMLSERIALIZE .. NO INDENT, but it is unrelated
> to this patch.
>
> CANONICAL serializes a physical representation of an xml document. In a
> nutshell, XMLSERIALIZE ... CANONICAL sort of "rewrites" the xml string
> with the following rules (list from the W3C recommendation):
>
> * The document is encoded in UTF-8
> * Line breaks normalized to #xA on input, before parsing
> * Attribute values are normalized, as if by a validating processor
> * Character and parsed entity references are replaced
> * CDATA sections are replaced with their character content
> * The XML declaration and document type declaration are removed
> * Empty elements are converted to start-end tag pairs
> * Whitespace outside of the document element and within start and end
> tags is normalized
> * All whitespace in character content is retained (excluding characters
> removed during line feed normalization)
> * Attribute value delimiters are set to quotation marks (double quotes)
> * Special characters in attribute values and character content are
> replaced by character references
> * Superfluous namespace declarations are removed from each element
> * Default attributes are added to each element
> * Fixup of xml:base attributes [C14N-Issues] is performed
> * Lexicographic order is imposed on the namespace declarations and
> attributes of each element
>
> btw: Oracle's SIZE =, HIDE DEFAULTS, and SHOW DEFAULTS are not part of
> the SQL/XML standard either :)
>
I know - looks so this function is not well designed generally
>
> > My objection against CANONICAL so SQL/XML and Oracle allows to
> > parametrize XMLSERIALIZE more precious and before implementing new
> > feature, we should to clean table and say, what we want to have in
> > XMLSERIALIZE.
> >
> > An alternative of enhancing of XMLSERIALIZE I can imagine just
> > function "to_canonical(xml, without_comments bool default false)". In
> > this case we don't need to solve relations against SQL/XML or Oracle.
>
> To create a separated serialization function would be IMHO way less
> elegant than to parametrize XMLSERIALIZE, but it would be something I
> could live with in case we decide to go down this path.
>
I am not strongly against enhancing XMLSERIALIZE, but it can be nice to see
some wider concept first. Currently the state looks just random - and I
didn't see any serious discussion about implementation fo SQL/XML. We don't
need to be necessarily compatible with Oracle, but it can help if we have a
functionality that can be used for conversions.
> Thanks!
>
> --
> Jim
>
> 1 - https://www.w3.org/TR/xml-c14n11/
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2024-08-26 12:21:26 | Re: Index AM API cleanup |
Previous Message | Zhijie Hou (Fujitsu) | 2024-08-26 12:14:19 | RE: Conflict detection and logging in logical replication |