From: | Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> |
---|---|
To: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
Cc: | Chapman Flack <chap(at)anastigmatix(dot)net>, vignesh C <vignesh21(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org> |
Subject: | Re: [PATCH] Add CANONICAL option to xmlserialize |
Date: | 2024-08-29 21:54:18 |
Message-ID: | 145c2f83-8610-4eba-a24d-b1e8620e47dd@uni-muenster.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 29.08.24 20:50, Pavel Stehule wrote:
>
> I know, but theoretically, there can be some benefit for CANONICAL if
> pg supports bytea there. Lot of databases still use non utf8 encoding.
>
> It is a more theoretical question - if pg supports different types
> there in future (because SQL/XML or Oracle), then CANONICAL can be
> used without limit,
I like the idea of extending the feature to support bytea. I can
definitely take a look at it, but perhaps in another patch? This change
would most likely involve transformXmlSerialize in parse_expr.c, and I'm
not sure of the impact in other usages of XMLSERIALIZE.
> or CANONICAL can be used just for text? And you are sure, so you can
> compare text X text, instead xml X xml?
Yes, currently it only supports varchar or text - and their cousins. The
idea is to format the xml and serialize it as text in a way that they
can compared based on their content, independently of how they were
written, e.g '<foo a="1" b="2"/>' is equal to '<foo b="2" a="1"/>'.
>
> +SELECT xmlserialize(CONTENT doc AS text CANONICAL) =
> xmlserialize(CONTENT doc AS text CANONICAL WITH COMMENTS) FROM
> xmltest_serialize;
> + ?column?
> +----------
> + t
> + t
> +(2 rows)
>
> Maybe I am a little bit confused by these regress tests, because at
> the end it is not too useful - you compare two identical XML, and WITH
> COMMENTS and WITHOUT COMMENTS is tested elsewhere. I tried to search
> for a sense of this test. Better to use really different documents
> (columns) instead.
Yeah, I can see that it's confusing. In this example I actually just
wanted to test that the default option of CANONICAL is CANONICAL WITH
COMMENTS, even if you don't mention it. In the docs I mentioned it like
this:
"The optional parameters WITH COMMENTS (which is the default) or WITH NO
COMMENTS, respectively, keep or remove XML comments from the given
document."
Perhaps I should rephrase it? Or maybe a comment in the regression tests
would suffice?
Thanks a lot for the input!
--
Jim
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Jones | 2024-08-29 22:06:46 | Re: [BUG?] XMLSERIALIZE( ... INDENT) won't work with blank nodes |
Previous Message | Mark Murawski | 2024-08-29 21:50:35 | Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD |