From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | xml type and encodings |
Date: | 2007-01-14 22:39:42 |
Message-ID: | 200701142339.42297.peter_e@gmx.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
We need to decide on how to handle encoding information embedded in xml
data that is passed through the client/server encoding conversion.
Here is an example:
Client encoding is A, server encoding is B. Client sends an xml datum
that looks like this:
INSERT INTO table VALUES (xmlparse(document '<?xml version="1.0"
encoding="C"?><content>...</content>'));
Assuming that A, B, and C are all distinct, this could fail at a number
of places.
I suggest that we make the system ignore all encoding declarations in
xml data. That is, in the above example, the string would actually
have to be encoded in client encoding B on the client, would be
converted to A on the server and stored as such. As far as I can tell,
this is easily implemented and allowed by the XML standard.
The same would be done on the way back. The datum would arrive in
encoding B on the client. It might be implementation-dependent whether
the datum actually contains an XML declaration specifying an encoding
and whether that encoding might read A, B, or C -- I haven't figured
that out yet -- but the client will always be required to consider it
to be B.
What should be done above the binary send/receive functionality?
Looking at the send/receive functions for the text type, they
communicate all data in the server encoding, so it seems reasonable to
do this here as well.
Comments?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
From | Date | Subject | |
---|---|---|---|
Next Message | Matthew T. O'Connor | 2007-01-14 22:49:27 | Re: Autovacuum improvements |
Previous Message | Neil Conway | 2007-01-14 22:38:50 | Re: [HACKERS] NaN behavior |