Re: large xml database

From: Viktor Bojović <viktor(dot)bojovic(at)gmail(dot)com>
To: Lutz Steinborn <l(dot)steinborn(at)4c-ag(dot)de>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: large xml database
Date: 2010-10-31 19:36:47
Message-ID: AANLkTi=Gg_uVB7B-tUNDH6HVs0oYxrBkYAzogMe05SUE@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Sun, Oct 31, 2010 at 7:08 AM, Lutz Steinborn <l(dot)steinborn(at)4c-ag(dot)de>wrote:

> On Sat, 30 Oct 2010 23:49:29 +0200
> Viktor Bojović <viktor(dot)bojovic(at)gmail(dot)com> wrote:
>
> >
> > many tries have failed because 8GB of ram and 10gb of swap were not
> enough.
> > also sometimes i get that more than 2^32 operations were performed, and
> > functions stopped to work.
> >
> we have a similar problem and we use the Amara xml Toolkit for python. To
> avoid
> the big memory consumption use pushbind. A 30G bme catalog file takes a
> maximum
> up to 20min to import. It might be faster because we are preparing complex
> objects with an orm. So the time consumption depends how complex the
> catalog is.
> If you use amara only to perform a conversion from xml to csv the final
> import
> can be done much faster.
>
> regards
>
> --
> Lutz
>
> http://www.4c-gmbh.de
>
>
Thanx Lutz, I will try to use that Amara and also I will try to parse it
with SAX.
I have tried twig and some other parsers but they consumed too much RAM.

--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Viktor Bojović 2010-10-31 19:53:55 Re: large xml database
Previous Message Tom Lane 2010-10-31 14:10:47 Re: A more efficient way?