Re: Fix XML handling with DOCTYPE

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ryan Lambert <ryan(at)rustprooflabs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fix XML handling with DOCTYPE
Date: 2019-03-17 17:11:28
Message-ID: 5C8E7FC0.8040401@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/17/19 11:45, Tom Lane wrote:
> Chapman Flack <chap(at)anastigmatix(dot)net> writes:
>> On 03/16/19 17:21, Tom Lane wrote:
>>> (1) always try to parse as document. If successful, we're done.
>>> (2) otherwise, if allowed by xmloption, try to parse using our
>
>> What I don't like about that is that (a) the input could be
>> arbitrarily long and complex to parse (not that you can't imagine
>> a database populated with lots of short little XML snippets, but
>> at the same time, a query could quite plausibly deal in yooge ones),
>> and (b), step (1) could fail at the last byte of the input, followed
>> by total reparsing as (2).
>
> That doesn't seem particularly likely to me: based on what's been
> said here, I'd expect parsing with the wrong expectation to usually
> fail near the start of the input. In any case, the other patch
> also requires repeat parsing, no? It's just doing that in a different
> set of cases.

I'll do up a version with the open-coded prescan I proposed last night.

Whether parsing with the wrong expectation is likely to fail near the
start of the input depends on both the input and the expectation. If
your expectation is DOCUMENT and the input is CONTENT, it's possible
for the determining difference to be something that follows the first
element, and a first element can be (and often is) nearly all of the input.

What I was doing in the patch is the reverse: parsing with the expectation
of CONTENT to see if a DTD gets tripped over. It isn't allowed for an
element to precede a DTD, so that approach can be expected to fail fast
if the other branch needs to be taken.

But a quick pre-scan for the same thing would have the same property,
without the libxml dependencies that bother you here. Watch this space.

Regards,
-Chap

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2019-03-17 17:14:24 Re: jsonpath
Previous Message Jonathan S. Katz 2019-03-17 17:06:04 Re: jsonpath