Re: [PATCH] Add CANONICAL option to xmlserialize

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, vignesh C <vignesh21(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org>
Subject: Re: [PATCH] Add CANONICAL option to xmlserialize
Date: 2024-08-29 18:50:33
Message-ID: CAFj8pRDrgOoJzxxOAswGcr7E+JZ-1SOoX+Oy3_RTPV=Jg4YGHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

út 27. 8. 2024 v 13:57 odesílatel Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
napsal:

>
>
> On 26.08.24 16:59, Pavel Stehule wrote:
> >
> > 1. what about behaviour of NO INDENT - the implementation is not too
> > old, so it can be changed if we want (I think), and it is better to do
> > early than too late
>
> While checking the feasibility of removing indentation with NO INDENT I
> may have found a bug in XMLSERIALIZE ... INDENT.
> xmlSaveToBuffer seems to ignore elements if there are whitespaces
> between them:
>
> SELECT xmlserialize(DOCUMENT '<foo><bar>42</bar></foo>' AS text INDENT);
> xmlserialize
> -----------------
> <foo> +
> <bar>42</bar>+
> </foo> +
>
> (1 row)
>
> SELECT xmlserialize(DOCUMENT '<foo> <bar>42</bar> </foo>'::xml AS text
> INDENT);
> xmlserialize
> ----------------------------
> <foo> <bar>42</bar> </foo>+
>
> (1 row)
>
> I'll take a look at it.
>

+1

> Regarding removing indentation: yes, it would be possible with libxml2.
> The question is if it would be right to do so.
> > 2. Are we able to implement SQL/XML syntax with libxml2?
> >
> > 3. Are we able to implement Oracle syntax with libxml2? And there are
> > benefits other than higher possible compatibility?
> I guess it would be beneficial if you're migrating from oracle to
> postgres - or the other way around. It certainly wouldn't hurt, but so
> far I personally had little use for the oracle's extra xmlserialize
> features.
> >
> > 4. Can there be some possible collision (functionality, syntax) with
> > CANONICAL?
> I couldn't find anything in the SQL/XML spec that might refer to
> canonocal xml.
> >
> > 5. SQL/XML XMLSERIALIZE supports other target types than varchar. I
> > can imagine XMLSERIALIZE with CANONICAL to bytea (then we don't need
> > to force database encoding). Does it make sense? Are the results
> > comparable?
> |
> As of pg16 bytea is not supported. Currently type| can be |character|,
> |character varying|, or |text - also their other flavours like 'name'.
>

I know, but theoretically, there can be some benefit for CANONICAL if pg
supports bytea there. Lot of databases still use non utf8 encoding.

It is a more theoretical question - if pg supports different types there in
future (because SQL/XML or Oracle), then CANONICAL can be used without
limit, or CANONICAL can be used just for text? And you are sure, so you can
compare text X text, instead xml X xml?

+SELECT xmlserialize(CONTENT doc AS text CANONICAL) = xmlserialize(CONTENT
doc AS text CANONICAL WITH COMMENTS) FROM xmltest_serialize;
+ ?column?
+----------
+ t
+ t
+(2 rows)

Maybe I am a little bit confused by these regress tests, because at the end
it is not too useful - you compare two identical XML, and WITH COMMENTS and
WITHOUT COMMENTS is tested elsewhere. I tried to search for a sense of this
test. Better to use really different documents (columns) instead.

Regards

Pavel

>
> |
>
> --
> Jim
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2024-08-29 18:52:06 Primary and standby setting cross-checks
Previous Message Ayush Vatsa 2024-08-29 18:47:47 Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch