From: | patrick mc allister <pma(at)rdorte(dot)org> |
---|---|
To: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Using XML_PARSE_HUGE in operations on xml fields? |
Date: | 2014-03-04 15:54:21 |
Message-ID: | 20140304155421.GM23803@rdorte.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello,
I ran into trouble with an xpath expression on a large XML file:
SELECT id, xpath('//tei:div/descendant::tei:head/text()', x, ARRAY[ARRAY['tei', 'http://www.tei-c.org/ns/1.0']]) AS stuff FROM test WHERE id=1;
returns:
ERROR: could not parse XML document
DETAIL: line 491482: internal error: Huge input lookup
स्थापितः।। </p>
^
line 491482: Extra content at the end of the document
स्थापितः।। </p>
^
I think this is an error that comes from the libxml2 library, since
pretty much the same thing happens with xmllint if you pass the
`--memory' option.
This is the libxml2 documentation which is relevant here, I think:
"""
#define XML_MAX_DICTIONARY_LIMIT
Maximum size allowed by the parser for a dictionary by default This is
not a limitation of the parser but a safety boundary feature, use
XML_PARSE_HUGE option to override it. Introduced in 2.9.0
"""
(see http://xmlsoft.org/html/libxml-parserInternals.html#XML_MAX_LOOKUP_LIMIT)
So I was wondering if and how I could set that XML_PARSE_HUGE option
in postgresql? I couldn't find anything in the docs or in this list's
archives.
If you want to replicate the problem quickly, with one approx 37MB xml
file and one smaller one:
createdb xmlpost
psql xmlpost
CREATE TABLE test (id integer, x xml);
\set content `curl http://sarit.indology.info/downloads/mahabharata-devanagari.xml`
INSERT INTO test (SELECT 1, (SELECT XMLPARSE (DOCUMENT :'content')));
\set content `curl http://sarit.indology.info/downloads/ratnakIrti-nibandhAvali.xml`
INSERT INTO test (SELECT 2, (SELECT XMLPARSE (DOCUMENT :'content')));
SELECT id, xpath('count(//tei:div/descendant::tei:head)', x, ARRAY[ARRAY['tei', 'http://www.tei-c.org/ns/1.0']]) AS stuff FROM test WHERE id=1; -- fails
SELECT id, xpath('count(//tei:div/descendant::tei:head)', x, ARRAY[ARRAY['tei', 'http://www.tei-c.org/ns/1.0']]) AS stuff FROM test WHERE id=2; -- works: 13
Thanks for any hints,
--
patrick
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2014-03-04 16:00:13 | Re: Role Inheritance Without Explicit Naming? |
Previous Message | Adrian Klaver | 2014-03-04 15:42:01 | Re: Segmentation fault |