From: | Radosław Smogura <rsmogura(at)softperience(dot)eu> |
---|---|
To: | Florian Pflug <fgp(at)phlo(dot)org> |
Cc: | PG Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Review of patch Bugfix for XPATH() if expression returns a scalar value |
Date: | 2011-06-30 08:33:57 |
Message-ID: | fc947491f738a2c6c2232ec1eecfa51a@mail.softperience.eu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 29 Jun 2011 22:26:39 +0200, Florian Pflug wrote:
> On Jun29, 2011, at 19:57 , Radosław Smogura wrote:
>> This is review of patch
>> https://commitfest.postgresql.org/action/patch_view?id=565
>> "Bugfix for XPATH() if expression returns a scalar value"
>>
>> SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth))
>> FROM (SELECT
>> (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES
>> (XMLELEMENT(name
>> "root", XMLATTRIBUTES('<n' AS xmlns, '<v' AS value),'<t'))) v(x)) as
>> foo;
>>
>> xmlelement
>> -------------------------
>> <root sth="&lt;n"/>
>>
>> It's clearly visible that value from attribute is "<n", not "<".
>> Every
>> parser will read this as "<n" which is not-natural and will
>> require form
>> consumer/developer to de-escape this on his side - roughly speaking
>> this will
>> be reported as serious bug.
>
> There's a problem there, no doubt, but your proposed solution just
> moves the
> problem around.
>
> Here's your query, reformatted to be more legible
>
> SELECT
> XMLELEMENT(name root,
> XMLATTRIBUTES(foo.namespace AS sth)
> )
> FROM (
> SELECT
> (XPATH('namespace-uri(/*)', x))[1] AS namespace
> FROM (VALUES (XMLELEMENT(name "root",
> XMLATTRIBUTES('<n' AS xmlns,
> '<v' AS value),'<t')
> )
> ) v(x)) as foo;
>
> What happens is that the XPATH expression selects the xmlns
> attribute with the value '<n'. To be well-formed xml, that value
> must be escaped, so what is actually returned by the XPATH
> call is '<n'. The XMLATTRIBUTES() function, however, doesn't
> distinguish between input of type XML and input of type TEXT,
> so it figures it has to represent the *textual* value '<n'
> in xml, and produces '&lt;n'.
>
> Not escaping textual values returned by XPATH isn't a solution,
> though. For example, assume someone inserts the value returned
> by the XPATH() call in your query into a column of type XML.
> If XPATH() returned '<n' literally instead of escaping it,
> the column would contain non-well-formed XML in a column of type
> XML. Not pretty, and pretty fatal during your next dump/reload
> cycle.
>
> Or, if you want another example, simply remove the XMLATTRIBUTES
> call and use the value returned by XPATH as a child node directly.
>
> SELECT
> XMLELEMENT(name root,
> foo.namespace
> )
> FROM (
> SELECT
> (XPATH('namespace-uri(/*)', x))[1] AS namespace
> FROM (VALUES (XMLELEMENT(name "root",
> XMLATTRIBUTES('<n' AS xmlns,
> '<v' AS value),'<t')
> )
> ) v(x)) as foo;
>
> xmlelement
> --------------------
> <root><n</root>
>
> Note that this correct! The <root> node contains a text node
> representing the textual value '<n'. If XPATH() hadn't return
> the value escaped, the query above would have produces
>
> <root><n</root>
>
> which is obviously wrong. It works in this case because XMLELEMENT
> is smart enough to distinguish between child nodes gives as TEXT
> values (those are escaped) and child nodes provided as XML values
> (those are inserted unchanged).
>
> XMLATTRIBUTES, however, doesn't make that distinction, and simply
> escaped all attributes values independent of their type. Now, the
> reason it does that is probably that *not* all XML values are
> well-formed attribute values without additional escaping. For
> example,
> you cannot have literal '<' and '>' in attribute values. So if
> XMLATTRIBUTES, like XMLELEMENT never escaped values which are
> already of type XML, the following piece of code
>
> XMLELEMENT(name "a", XMLATTRIBUTES('<node/>'::xml as v))
>
> would produce
>
> <a v="<node/>"/>
>
> which, alas, is not well-formed :-(
>
> The correct solution, I believe, is to teach XMLATTRIBUTES to
> only escape those characters which are invalid in attribute
> values but valid in XML (Probably only '<' and '>' but I'll
> check). If there are no objects, I'll prepare a separate patch
> for that. I don't want to include it in this patch because it's
> really a distinct issue (even modifies a different function).
>
> best regards,
> Florian Pflug
You may manually enter invalid xml too, You don't need xpath for this.
Much more
In PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit
I executed
SELECT XMLELEMENT(name "a", XMLATTRIBUTES('<node/>'::xml as v))
and it's looks like I got correct output "<a v="<node/>"/>"
when I looked with text editor into table files I saw same value.
I will check on last master if I can reproduce this.
But indeed, now PostgreSQL is not type safe against XML type. See
SELECT XMLELEMENT(name "root", '<', (xpath('//text()',
'<root><</root>'))[1])).
You found some problem with escaping, but solution is bad, I think
problem lies with XML type, which may be used to hold strings and proper
xml. For example above query can't distinguish if (xpath('//text()',
'<root><</root>'))[1] is xml text, or xml element.
For me adding support for scalar is really good and needed, but
escaping is not.
Regards,
Radosław Smogura
From | Date | Subject | |
---|---|---|---|
Next Message | Hitoshi Harada | 2011-06-30 08:37:43 | Re: Parameterized aggregate subquery (was: Pull up aggregate subquery) |
Previous Message | Yeb Havinga | 2011-06-30 08:03:22 | Re: Parameterized aggregate subquery |