Re: XMLDocument (SQL/XML X030)

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Robert Treat <rob(at)xzilla(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Chapman Flack <jcflack(at)acm(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: XMLDocument (SQL/XML X030)
Date: 2025-01-28 08:14:48
Message-ID: 82cbac18-7bc6-42ec-be2f-4e66358bd43f@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert

On 28.01.25 05:54, Robert Treat wrote:
> Was playing around with the patch and was thinking about this wording:
> "The xmldocument function returns the input argument
> unchanged... and is provided for compatibility."
>
> When I run an example similar to the db2 example you gave earlier:
>
> pagila=# SELECT xmldocument(xmlforest(10 as X, 20 as Y));
> xmldocument
> --------------------
> <x>10</x><y>20</y>
>
> In the db2 case, this is preserved as UPPER (which is to say, db2 case
> folds UPPER, and the input happens to match that), but we case fold
> lower, because we do; presumably you'd get the opposite effect in db2
> running the input with unquoted lower field names(?).

Yes.

SELECT 42 AS foo FROM SYSIBM.SYSDUMMY1

FOO        
-----------
         42

  1 record(s) selected.

> In any case (no
> pun intended), SQL folks probably don't care much about that
> discrepancy, but given xml is case sensitive, maybe xml people do?

That's a good point. DB2 converts unquoted identifiers to uppercase by
default, which, if memory serves, aligns with the SQL standard. In the
case of this xmlforest example, my guess is that DB2 treats the elements
as identifiers and normalizes them to uppercase as well, as DB2 does not
handle XML documents as text like we do. To preserve case, you'd need to
explicitly quote the identifiers:

SELECT xmlforest(10 AS "x", 20 AS "y") FROM SYSIBM.SYSDUMMY1

--------------------
<x>10</x><y>20</y>

  1 record(s) selected.

Things look different when constructing the xml document directly from a
string:

SELECT xmlparse(DOCUMENT '<root><foo>bar</foo></root>') FROM
SYSIBM.SYSDUMMY1

----------------------------
<root><foo>bar</foo></root>

  1 record(s) selected.

I'd say the difference is due to how the two systems handle the XML data
type and unquoted identifiers in general, rather than a difference in
the behaviour of the function itself. Sticking to quoted identifiers in
both systems helps:

SELECT xmlforest(42 AS "foo", 73 AS "bar");
         xmlforest          
----------------------------

 <foo>42</foo><bar>73</bar>

Probably that's why most DB2 examples in their documentation use quoted
identifiers :)

Best regards, Jim

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2025-01-28 08:42:38 Re: POC, WIP: OR-clause support for indexes
Previous Message Keisuke Kuroda 2025-01-28 07:48:09 Re: [PATCH] Improve code coverage of network address functions