Encoding problems in PostgreSQL with XML data

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Encoding problems in PostgreSQL with XML data
Date: 2004-01-09 18:46:01
Message-ID: 200401091946.01930.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This is not directly related to current development, but it is something
that might need a low-level solution. I've been thinking for some time
about how to enchance the current "XML support" (e.g., contrib/xml).

The central problem I have is this: How do we deal with the fact that
an XML datum carries its own encoding information?

Here's a scenario: It is desirable to have validity checking on XML
input, be it a special XML data type or some functions that take XML
data. Say we define a data type that stores XML documents and rejects
documents that are not well-formed. I want to insert something in
psql:

CREATE TABLE test (
description text,
content xml
);

INSERT INTO test VALUES ('test document', '<?xml
version="1.0"?><doc><para>blah</para>...</doc>');

Now an XML parser will assume this document to be in UTF-8, and say at
the client it is. What if client_encoding=UNICODE but
server_encoding=LATIN1? Do we expect some layer to rewrite the <?xml?>
declaration to contain the correct encoding information? Or can the
xml type bypass encoding conversion? What about reading it back out of
the database with yet another client encoding?

Rewriting the <?xml?> declaration seems like a workable solution, but it
would break the transparency of the client/server encoding conversion.
Also, some people might dislike that their documents are being changed
as they are stored.

Any ideas?

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Pflug 2004-01-09 19:06:45 Re: OLE DB driver
Previous Message Peter Eisentraut 2004-01-09 17:56:03 Re: "with grant option" for user groups.