From: | Chapman Flack <chap(at)anastigmatix(dot)net> |
---|---|
To: | |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: PostgreSQL vs SQL/XML Standards |
Date: | 2019-02-11 15:51:25 |
Message-ID: | 3e8eab9e-7289-6c23-5e2c-153cccea2257@anastigmatix.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
[Resending to list so commitfest app will see it; the list blocked
this message the first time on a mail reputation issue. Sorry for
the duplication. I've removed the individual cc:s from this message.]
On 02/05/19 23:16, Chapman Flack wrote:
> I wonder whether, given the move to next CF, it makes sense to change
> the title of the CF entry from "XMLTABLE" to, more generically, XML
> improvements, and get one or two more small changes in:
Interpreting the crickets as approval, I have changed the title of the
CF entry, and the status back to Needs Review, with these patches
attached:
xmltable-xpath-result-processing-bugfix-6.patch
xmltable-xmlexists-passing-mechanisms-3.patch
xml-functions-type-docfix-2.patch
xml-content-2006-1.patch
That last one is new, and everything is rebased (onto 068503c).
xmltable-xpath-result-processing-bugfix-6.patch includes a regress/expected
output for the no-libxml case that was left out of -5.
xml-functions-type-docfix-2.patch removes one more sentence I had meant
to remove[1] but forgotten to.
xml-content-2006-1.patch does this:
> - get XMLPARSE(CONTENT... (and cast-to-xml with XMLOPTION=content) to
> succeed even for content with DTDs, so that the content subtype really
> does fully include the document subtype, aligning it with the SQL:2006+
> standard. I think this would be a simple patch that I can deliver early
> this month, and Tom found reports where the current behavior already
> bites people in pg_restore. Its only effect would be to allow a currently-
> failing case to succeed (and stop biting people).
It works as suggested in [2], just by intercepting the error if a
parse-as-content trips over a DTD, and retrying as a parse-as-document.
While that has a certain hacky smell, it also has the advantage of
handling what's probably an uncommon edge case in a way that adds no
upfront cost. (Other, 'tidier' approaches could involve evaluating a
regex first to decide how to parse--I believe everything that's allowed
ahead of a DTD makes a regular language--but that would add cycles to
every parse.)
In xml.c one does find the following comment:
* TODO maybe libxml2's xmlreader is better? (do not construct DOM,
* yet do not use SAX - see xmlreader.c)
and yes, I think a complete rewrite of xml_parse along those lines would
probably be a substantial win (why construct an internal DOM just to confirm
that the input is parsable, then throw it away?). But that would be a more
involved rewrite that I'm not volunteering to do.
This patch is a quick way to get the desired behavior given the current
implementation.
-Chap
[1]
https://www.postgresql.org/message-id/5C4A94A5.8010402%40anastigmatix.net
[2]
https://www.postgresql.org/message-id/5C4BDBFF.6040905%40anastigmatix.net
Attachment | Content-Type | Size |
---|---|---|
xmltable-xpath-result-processing-bugfix-6.patch | text/x-patch | 14.4 KB |
xmltable-xmlexists-passing-mechanisms-3.patch | text/x-patch | 5.8 KB |
xml-functions-type-docfix-2.patch | text/x-patch | 30.1 KB |
xml-content-2006-1.patch | text/x-patch | 16.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2019-02-11 15:51:40 | Re: libpq compression |
Previous Message | Alvaro Herrera | 2019-02-11 15:46:07 | Re: libpq compression |