From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Encoding problems in PostgreSQL with XML data |
Date: | 2004-01-09 20:44:14 |
Message-ID: | 3FFF129E.6020109@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Perhaps the document should be stored in canonical form. See
http://www.w3.org/TR/xml-c14n
I think I agree with Rod's opinion elsewhere in this thread. I guess the
"philosophical" question is this: If 2 XML documents with different
encodings have the same canonical form, or perhaps produce the same DOM,
are they equivalent? Merlin appears to want to say "no", and I think I
want to say "yes".
cheers
andrew
Merlin Moncure wrote:
>Peter Eisentraut wrote:
>
>
>>The central problem I have is this: How do we deal with the fact that
>>an XML datum carries its own encoding information?
>>
>>
>
>Maybe I am misunderstanding your question, but IMO postgres should be
>treating xml documents as if they were binary data, unless the server
>takes on the role of a parser, in which case it should handle
>unspecified/unknown encodings just like a normal xml parser would (and
>this does *not* include changing the encoding!).
>
>According to me, an XML parser should not change one bit of a document,
>because that is not a 'parse', but a 'transformation'.
>
>
>
>>Rewriting the <?xml?> declaration seems like a workable solution, but
>>
>>
>it
>
>
>>would break the transparency of the client/server encoding conversion.
>>Also, some people might dislike that their documents are being changed
>>as they are stored.
>>
>>
>
>Right, your example begs the question: why does the server care what the
>encoding of the documents is (perhaps indexing)? ZML validation is a
>standardized operation which the server (or psql, I suppose) can
>subcontract out to another application.
>
>Just a side thought: what if the xml encoding type was built into the
>domain type itself?
>create domain xml_utf8 ...
>Which allows casting, etc. which is more natural than an implicit
>transformation.
>
>Regards,
>Merlin
>
>---------------------------(end of broadcast)---------------------------
>TIP 8: explain analyze is your friend
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Shachar Shemesh | 2004-01-09 20:58:47 | Re: OLE DB driver |
Previous Message | Merlin Moncure | 2004-01-09 20:04:11 | Re: Encoding problems in PostgreSQL with XML data |