From: | Hannu Krosing <hannu(at)tm(dot)ee> |
---|---|
To: | Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net> |
Subject: | Re: Encoding problems in PostgreSQL with XML data |
Date: | 2004-01-15 20:46:32 |
Message-ID: | 1074199591.3292.12.camel@fuji.krosing.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Merlin Moncure kirjutas N, 15.01.2004 kell 18:43:
> Hannu Krosing wrote:
> > select
> > '<d/>'::xml == '<?xml version="1.0" encoding="utf-8"?>\n<d/>\n'::xml
>
> Right: I understand your reasoning here. Here is the trick:
>
> select '[...]'::xml introduces a casting step which justifies a
> transformation. The original input data is not xml, but varchar. Since
> there are no arbitrary rules on how to do this, we have some flexibility
> here to do things like change the encoding/mess with the whitespace. I
> am trying to find away to break the assumption that my xml data
> necessarily has to be converted from raw text.
>
> My basic point is that we are confusing the roles of storing and
> parsing/transformation. The question is: are we storing xml documents
> or the metadata that makes up xml documents? We need to be absolutely
> clear on which role the server takes on...in fact both roles may be
> appropriate for different situations, but should be represented by a
> different type. I'll try and give examples of both situations.
>
> If we are strictly storing documents, IMO the server should perform zero
> modification on the document. Validation could be applied conceptually
> as a constraint (and, possibly XSLT/XPATH to allow a fancy type of
> indexing). However there is no advantage that I can see to manipulating
> the document except to break the 'C' of ACID. My earlier comments wrt
> binary encoding is that there simply has to be a way to prevent the
> server mucking with my document.
>
> For example, if I was using postgres to store XML-EDI documents in a DX
> system this is the role I would prefer. Validation and indexing are
> useful, but my expected use of the server is a type of electronic xerox
> of the incoming document. I would be highly suspicious of any
> modification the server made to my document for any reason.
The current charset/encoding support can be evil in some cases ;(
The only solution seems to be keeping both server and client encoding as
ASCII (or just disable it)
The proper path to encodings must unfortunately do the encoding
conversions *after* parsing, when it is known, which parts of the
original query string should be changed.
Or, as you suggested, always encode anything outside plain ASCII (n<32
and n>127), both on input (can be done client-side) and output (IIRC
needs another type with different output function)
> Based on your suggestions I think you are primarily concerned with the
> second example. However, in my work I do a lot of DX and I see the xml
> document as a binary object. Server-side validation would be extremely
> helpful, but please don't change my document!
So the problem is not exactly XML, but rather problems with changing
encodings of "binary" strings that should not be changed.
I hope (but I'm not sure) that keeping client and server encodings the
same should prevent that.
> So, I submit that we are both right for different reasons.
Seems so.
-----------------
Hannu
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Brusser | 2004-01-15 21:48:55 | Postgres v.7.3.4 - Performance tuning |
Previous Message | Richard Huxton | 2004-01-15 19:23:09 | Re: Bug and/or feature? Complex data types in tables... |