| From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | xml type and encodings |
| Date: | 2007-01-14 22:39:42 |
| Message-ID: | 200701142339.42297.peter_e@gmx.net |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
We need to decide on how to handle encoding information embedded in xml
data that is passed through the client/server encoding conversion.
Here is an example:
Client encoding is A, server encoding is B. Client sends an xml datum
that looks like this:
INSERT INTO table VALUES (xmlparse(document '<?xml version="1.0"
encoding="C"?><content>...</content>'));
Assuming that A, B, and C are all distinct, this could fail at a number
of places.
I suggest that we make the system ignore all encoding declarations in
xml data. That is, in the above example, the string would actually
have to be encoded in client encoding B on the client, would be
converted to A on the server and stored as such. As far as I can tell,
this is easily implemented and allowed by the XML standard.
The same would be done on the way back. The datum would arrive in
encoding B on the client. It might be implementation-dependent whether
the datum actually contains an XML declaration specifying an encoding
and whether that encoding might read A, B, or C -- I haven't figured
that out yet -- but the client will always be required to consider it
to be B.
What should be done above the binary send/receive functionality?
Looking at the send/receive functions for the text type, they
communicate all data in the server encoding, so it seems reasonable to
do this here as well.
Comments?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Matthew T. O'Connor | 2007-01-14 22:49:27 | Re: Autovacuum improvements |
| Previous Message | Neil Conway | 2007-01-14 22:38:50 | Re: [HACKERS] NaN behavior |