From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: possible encoding issues with libxml2 functions |
Date: | 2017-08-20 20:37:10 |
Message-ID: | CAFj8pRAECZC9v==YOFi2bpgcxDvMmpe80oC90Z9Ches41xOiNw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi
> xpath-bugfix.patch affected only xml values containing an xml declaration
> with
> "encoding" attribute. In UTF8 databases, this latest proposal
> (xpath-parsing-error-fix.patch) is equivalent to xpath-bugfix.patch. In
> non-UTF8 databases, xpath-parsing-error-fix.patch affects all xml values
> containing non-ASCII data. In a LATIN1 database, the following works today
> but breaks under your latest proposal:
>
> SELECT xpath('text()', ('<x>' || convert_from('\xc2b0', 'LATIN1') ||
> '</x>')::xml);
>
I don't understand, why it should not work?
>
> It's acceptable to break that, since the documentation explicitly disclaims
> support for it. xpath-bugfix.patch breaks different use cases, which are
> likewise acceptable to break. See my 2017-08-08 review for details.
>
> We have xpath-bugfix.patch and xpath-parsing-error-fix.patch. Both are
> equivalent under supported use cases (xpath in UTF8 databases). Among
> non-supported use cases, they each make different things better and
> different
> things worse. We should prefer to back-patch the version harming fewer
> applications. I expect non-ASCII data is more common than xml declarations
> with "encoding" attribute, so xpath-bugfix.patch will harm fewer
> applications.
>
> Having said that, I now see a third option. Condition this thread's
> patch's
> effects on GetDatabaseEncoding()==PG_UTF8. That way, we fix supported
> cases,
> and we remain bug-compatible in unsupported cases. I think that's better
> than
> the other options discussed so far. If you agree, please send a patch
> based
> on xpath-bugfix.patch with the GetDatabaseEncoding()==PG_UTF8 change and
> the
> two edits I described earlier.
>
I am sorry - too long day today. Do you think some like
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 24229c2dff..9fd6f3509f 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3914,7 +3914,14 @@ xpath_internal(text *xpath_expr_text, xmltype *data,
ArrayType *namespaces,
if (ctxt == NULL || xmlerrcxt->err_occurred)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
"could not allocate parser context");
- doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
+
+ /*
+ * Passed XML is always in server encoding. When server encoding
+ * is UTF8, we can pass this information to libxml2 to ignore
+ * possible invalid encoding declaration in XML document.
+ */
+ doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL,
+ GetDatabaseEncoding() == PG_UTF8 ? "UTF-8" : NULL, 0);
if (doc == NULL || xmlerrcxt->err_occurred)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
"could not parse XML document");
This fix only UTF8 servers and has no any effect on other cases
?
Regards
Pavel
> Thanks,
> nm
>
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2017-08-20 21:21:47 | Re: Support for Secure Transport SSL library on macOS as OpenSSL alternative |
Previous Message | Fabien COELHO | 2017-08-20 20:30:31 | Re: pgbench tap tests & minor fixes |