From: | Dominique Devienne <ddevienne(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Erik Wienhold <ewie(at)ewie(dot)name>, Sai Teja <saitejasaichintalapudi(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Huge input lookup exception when trying to create the index for XML data type column in postgreSQL |
Date: | 2023-09-08 09:45:47 |
Message-ID: | CAFCRh-_qvsOMCNSTQf_q8jsqn_Mfzm6pHApG1WJJaGqYKkQU2g@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Fri, Sep 8, 2023 at 11:39 AM Dominique Devienne <ddevienne(at)gmail(dot)com>
wrote:
> On Thu, Sep 7, 2023 at 10:22 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Erik Wienhold <ewie(at)ewie(dot)name> writes:
>> > Looks like "Huge input lookup" as reported in [1] (also from Sai) and
>> that
>> > error is from libxml.
>>
>> Ah, thanks for the pointer. It looks like for the DOCUMENT case,
>> we could maybe relax this restriction by passing the XML_PARSE_HUGE
>> option to xmlCtxtReadDoc(). However, there are things to worry about:
>>
>
> Just a remark from the sidelines, from someone having done a fair bit of
> XML in years past.
>
> That XPath is simple, and a streaming parser (SAX or StAX) could handle
> it. While that
> XML_PARSE_HUGE option probably applies to a DOM parser. So is there a
> work-around
> to somehow force using a streaming parser instead of one that must produce
> the whole Document,
> just so a few elements are picked out of it? FWIW. --DD
>
If push comes to shove, the streaming-based extraction can be done outside
the DB, stored in a new column
or table, and index that instead. This is in fact exactly the approach I
took on one server handling XML I wrote.
To be honest, in my case, the XMLs were never large, so I used rapidxml
which is also a DOM parser,
but the same principle applies though, i.e. extract the data from the XML
outside the DB using
SAX (push) / StAX (pull), to avoid having a (too) large document in memory
at any time (client or server side). --DD
From | Date | Subject | |
---|---|---|---|
Next Message | duc hiep ha | 2023-09-08 10:19:01 | Is data import from Oracle table to Postgres table in Ora2pg consecutive or random? |
Previous Message | Dominique Devienne | 2023-09-08 09:39:00 | Re: Huge input lookup exception when trying to create the index for XML data type column in postgreSQL |