From: | Viktor Bojović <viktor(dot)bojovic(at)gmail(dot)com> |
---|---|
To: | Lutz Steinborn <l(dot)steinborn(at)4c-ag(dot)de> |
Cc: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: large xml database |
Date: | 2010-10-31 19:36:47 |
Message-ID: | AANLkTi=Gg_uVB7B-tUNDH6HVs0oYxrBkYAzogMe05SUE@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
On Sun, Oct 31, 2010 at 7:08 AM, Lutz Steinborn <l(dot)steinborn(at)4c-ag(dot)de>wrote:
> On Sat, 30 Oct 2010 23:49:29 +0200
> Viktor Bojović <viktor(dot)bojovic(at)gmail(dot)com> wrote:
>
> >
> > many tries have failed because 8GB of ram and 10gb of swap were not
> enough.
> > also sometimes i get that more than 2^32 operations were performed, and
> > functions stopped to work.
> >
> we have a similar problem and we use the Amara xml Toolkit for python. To
> avoid
> the big memory consumption use pushbind. A 30G bme catalog file takes a
> maximum
> up to 20min to import. It might be faster because we are preparing complex
> objects with an orm. So the time consumption depends how complex the
> catalog is.
> If you use amara only to perform a conversion from xml to csv the final
> import
> can be done much faster.
>
> regards
>
> --
> Lutz
>
> http://www.4c-gmbh.de
>
>
Thanx Lutz, I will try to use that Amara and also I will try to parse it
with SAX.
I have tried twig and some other parsers but they consumed too much RAM.
--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me
From | Date | Subject | |
---|---|---|---|
Next Message | Viktor Bojović | 2010-10-31 19:53:55 | Re: large xml database |
Previous Message | Tom Lane | 2010-10-31 14:10:47 | Re: A more efficient way? |