From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: XML with invalid chars |
Date: | 2011-04-28 03:22:37 |
Message-ID: | 4DB8DD7D.3070905@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 04/27/2011 05:30 PM, Noah Misch wrote:
>
>> I'm not sure what to do about the back branches and cases where data is
>> already in databases. This is fairly ugly. Suggestions welcome.
> We could provide a script in (or linked from) the release notes for testing the
> data in all your xml columns.
Yeah, we'll have to do something like that. What a blasted mess,
> To make things worse, the dump/reload problems seems to depend on your version
> of libxml2, or something. With git master, a CentOS 5 system with
> 2.6.26-2.1.2.8.el5_5.1 accepts the ^A byte, but an Ubuntu 8.04 LTS system with
> 2.6.31.dfsg-2ubuntu rejects it. Even with a patch like this, systems with a
> lenient libxml2 will be liable to store XML data that won't restore on a system
> with a strict libxml2. Perhaps we should emit a build-time warning if the local
> libxml2 is lenient?
No, I think we need to be strict ourselves.
>> + if (*p< '\x20')
> This needs to be an unsigned comparison. On my system, "char" is signed, so
> "SELECT xmlelement(name foo, null, E'\u0550')" fails incorrectly.
Good point. Perhaps we'd be better off using iscntrl(*p).
> The XML character set forbids more than just control characters; see
> http://www.w3.org/TR/xml/#charsets. We also ought to reject, for example,
> "SELECT xmlelement(name foo, null, E'\ufffe')".
>
> Injecting the check here aids "xmlelement" and "xmlforest" , but "xmlcomment"
> and "xmlpi" still let the invalid byte through. You can also still inject the
> byte into an attribute value via "xmlelement". I wonder if it wouldn't make
> more sense to just pass any XML that we generate from scratch through libxml2.
> There are a lot of holes to plug, otherwise.
>
Maybe there are, but I'd want lots of convincing that we should do that
at this stage. Maybe for 9.2. I think we can plug the holes fairly
simply for xmlpi and xmlcomment, and catch the attributes by moving this
check up into map_sql_value_to_xml_value().
This is a significant data integrity bug, much along the same lines as
the invalidly encoded data holes we plugged a release or two back. I'm
amazed we haven't hit it till now, but we're sure to see more of it -
XML use with Postgres is growing substantially, I believe.
cheers
andrew
From | Date | Subject | |
---|---|---|---|
Next Message | HSIEN-WEN CHU | 2011-04-28 03:33:44 | VX_CONCURRENT flag on vxfs( 5.1 or later) for performance for postgresql? |
Previous Message | Vlad Arkhipov | 2011-04-28 03:07:34 | Re: Predicate locking |