From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: possible encoding issues with libxml2 functions |
Date: | 2017-03-12 19:36:58 |
Message-ID: | CAFj8pRADS6yUVwziAS_v2c_oPEse6RZgk2WnSa8o3i-GwTKmaA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2017-03-12 0:56 GMT+01:00 Noah Misch <noah(at)leadboat(dot)com>:
> On Mon, Feb 20, 2017 at 07:48:18PM +0100, Pavel Stehule wrote:
> > Today I played with xml_recv function and with xml processing functions.
> >
> > xml_recv function ensures correct encoding from document encoding to
> server
> > encoding. But the decl section holds original encoding info - that should
> > be obsolete after encoding. Sometimes we solve this issue by removing
> decl
> > section - see the xml_out function.
> >
> > Sometimes we don't do it - lot of functions uses direct conversion from
> > xmltype to xmlChar.
>
> > There are possible two fixes
> >
> > a) clean decl on input - the encoding info can be removed from decl part
> >
> > b) use xml_out_internal everywhere before transformation to
> > xmlChar. pg_xmlCharStrndup can be good candidate.
>
> I'd prefer (a) if the xml type were a new feature, because no good can
> come of
> storing an encoding in each xml field when we know the actual encoding is
> the
> database encoding. However, if you implemented (a), we'd still see
> untreated
> values brought over via pg_upgrade. Therefore, I would try (b) first. I
> suspect the intent of xml_parse() was to implement (b); it will be
> interesting
> to see your test case that malfunctions.
>
I looked there again and I found so this issue is related to xpath function
only
Functions based on xml_parse are working without problems. xpath_internal
uses own direct xmlCtxtReadMemory without correct encoding sanitation.
so fix is pretty simple
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index f81cf489d2..89aae48cb3 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype *data,
ArrayType *namespaces,
ns_count = 0;
}
- datastr = VARDATA(data);
- len = VARSIZE(data) - VARHDRSZ;
+ datastr = xml_out_internal(data, 0);
+ len = strlen(datastr);
+
xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ;
+
if (xpath_len == 0)
ereport(ERROR,
(errcode(ERRCODE_DATA_EXCEPTION),
Regards
Pavel
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2017-03-12 19:38:58 | Re: [REVIEW] macaddr 64 bit (EUI-64) datatype support |
Previous Message | Tom Lane | 2017-03-12 19:29:37 | Re: \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless) |