From: | Hannu Krosing <hannu(at)tm(dot)ee> |
---|---|
To: | Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Encoding problems in PostgreSQL with XML data |
Date: | 2004-01-15 11:10:16 |
Message-ID: | 1074165016.3206.27.camel@fuji.krosing.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Merlin Moncure kirjutas K, 14.01.2004 kell 15:49:
> Hannu Krosing wrote:
> > I hope that real as-needed-column-by-column translation will be used
> > with bound argument queries.
> >
> > It also seems possible to delegate the encoding changes to after the
> > query is parsed, but this will never work for EBCDIC and other funny
> > encodings (like rot13 ;).
> >
> > for these we need to define the actual SQL statement encoding on-wire
> to
> > be always ASCII.
>
> In that case, treat the XML document like a binary stream, using
> PQescapeBytea, etc. to encode if necessary pre-query. Also, the XML
> domain should inherit from bytea, not varchar.
why ?
the allowed characters repertoire in XML is even less than in varchar.
> The document should be stored bit for bit as was submitted.
Or in some pre-parsed form which allows restoration of submitted form,
which could be more for things like xpath queries or subtree extraction.
> If we can do that for bitmaps, why can't we do it for XML documents?
>
> OTOH, if we are transforming the document down to a more generic format
> (either canonical or otherwise), then the xml could be dealt with like
> text in the ususal way. Of course, then we are not really storing xml,
> more like 'meta' xml ;)
On the contrary! If there is DTD or Schema or other structure definition
for XML, then we know which whitespace is significant and can do
whatever we like with insignificant whitespace.
It also is ok to store all XML in some UNICODE encoding as this is what
every XML must be convertible to.
its he same as storing ints - you don't care if you specified 1000 ot
1e3 when doing the insert as
hannu=# select 1000=1e3;
?column?
----------
t
(1 row)
in the same way the following should also be true
select
'<d/>'::xml == '<?xml version="1.0" encoding="utf-8"?>\n<d/>\n'::xml
;
-----------
Hannu
From | Date | Subject | |
---|---|---|---|
Next Message | jihuang | 2004-01-15 11:11:30 | FYI , Intel CC and PostgreSQL , benchmark by pgsql |
Previous Message | Michael Glaesemann | 2004-01-15 09:23:21 | Re: Bug and/or feature? Complex data types in tables... |