Quick Links

Re: large xml database

From:	Viktor Bojović <viktor(dot)bojovic(at)gmail(dot)com>
To:	Lutz Steinborn <l(dot)steinborn(at)4c-ag(dot)de>
Cc:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: large xml database
Date:	2010-10-31 19:36:47
Message-ID:	AANLkTi=Gg_uVB7B-tUNDH6HVs0oYxrBkYAzogMe05SUE@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

On Sun, Oct 31, 2010 at 7:08 AM, Lutz Steinborn <l(dot)steinborn(at)4c-ag(dot)de>wrote:

> On Sat, 30 Oct 2010 23:49:29 +0200
> Viktor Bojović <viktor(dot)bojovic(at)gmail(dot)com> wrote:
>
> >
> > many tries have failed because 8GB of ram and 10gb of swap were not
> enough.
> > also sometimes i get that more than 2^32 operations were performed, and
> > functions stopped to work.
> >
> we have a similar problem and we use the Amara xml Toolkit for python. To
> avoid
> the big memory consumption use pushbind. A 30G bme catalog file takes a
> maximum
> up to 20min to import. It might be faster because we are preparing complex
> objects with an orm. So the time consumption depends how complex the
> catalog is.
> If you use amara only to perform a conversion from xml to csv the final
> import
> can be done much faster.
>
> regards
>
> --
> Lutz
>
> http://www.4c-gmbh.de
>
>
Thanx Lutz, I will try to use that Amara and also I will try to parse it
with SAX.
I have tried twig and some other parsers but they consumed too much RAM.

--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me

In response to

Re: large xml database at 2010-10-31 06:08:57 from Lutz Steinborn

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Viktor Bojović	2010-10-31 19:53:55	Re: large xml database
Previous Message	Tom Lane	2010-10-31 14:10:47	Re: A more efficient way?