From: | "Florian G(dot) Pflug" <fgp(at)phlo(dot)org> |
---|---|
To: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: xml type and encodings |
Date: | 2007-01-16 17:41:56 |
Message-ID: | 45AD0E64.2050301@phlo.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Peter Eisentraut wrote:
> I wrote:
>> We need to decide on how to handle encoding information embedded in
>> xml data that is passed through the client/server encoding
>> conversion.
>
> Tangentially related, I'm currently experimenting with a setup that
> stores all xml data in UTF-8 on the server, converting it back to the
> server encoding on output. This doesn't do anything to solve the
> problem above, but it makes the internal processing much simpler, since
> all of libxml uses UTF-8 internally anyway. Is anyone opposed to that
> setup on principle?
If you do that, maybe it would be the easiest and least confusing thing
to just _always_ represent an xml document in utf-8, ignoring the client_encoding
entirely for xml. The only good reason for not using utf-8 that comes to
my mind is the increased storage size, especially for eastern scripts where
nearly all characters need 2 or more bytes. But if you store it in utf-8
internally anyway, than I don't think this arguments carries a lot of weight
anymore...
You could warn the user about that fact whenever he sends or recieves an
xml document, and the client_encoding is not set to utf-8.
Not that I'm entirely conviced about this being a good idea myself - but I
think I'd prefer a clear rule like that over surprises like "text and binary
output have different semantics" or "the encoding information is totally misleading
and must be ignored". And most software that uses xml probably uses utf-8...
greetings, Florian Pflug
From | Date | Subject | |
---|---|---|---|
Next Message | Martijn van Oosterhout | 2007-01-16 18:13:47 | Re: xml type and encodings |
Previous Message | Enrico | 2007-01-16 17:04:07 | Index for similarity search |