From: | Andreas Joseph Krogh <andreak(at)officenet(dot)no> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: large xml database |
Date: | 2010-10-30 22:06:25 |
Message-ID: | 4CCC96E1.9050806@officenet.no |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
On 10/30/2010 11:49 PM, Viktor Bojović wrote:
> Hi,
> i have very big XML documment which is larger than 50GB and want to import
> it into databse, and transform it to relational schema.
> When splitting this documment to smaller independent xml documments i get
> ~11.1mil XML documents.
> I have spent lots of time trying to get fastest way to transform all this
> data but every time i give up because it takes too much time. Sometimes more
> than month it would take if not stopped.
> I have tried to insert each line as varchar into database and parse it using
> plperl regex..
> also i have tried to store every documment as XML and parse it, but it is
> also to slow.
> i have tried to store every documment as varchar but it is also slow when
> using regex to get data.
>
> many tries have failed because 8GB of ram and 10gb of swap were not enough.
> also sometimes i get that more than 2^32 operations were performed, and
> functions stopped to work.
>
> i wanted just to ask if someone knows how to speed this up.
>
> thanx in advance
Use a SAX-parser and handle the endElement(String name) events to insert
the element's content into your db.
- --
Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Senior Software Developer / CTO
Public key: http://home.officenet.no/~andreak/public_key.asc
- ------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Rosenholmveien 25 | know how to do a thing and to watch |
1414 Trollåsen | somebody else doing it wrong, without |
NORWAY | comment. |
| |
Tlf: +47 24 15 38 90 | |
Fax: +47 24 15 38 91 | |
Mobile: +47 909 56 963 | |
- ------------------------+---------------------------------------------+
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iF4EAREIAAYFAkzMltwACgkQ+QNFm4X8jCLZzwD/ZIAktYXFqwUgtLLiHgYpoYNo
Nf+r1r9cGNVIwMC6kH8A/i0RUwAkL45xeQ8CsiyALXYAawZF/n6Fnql15qAkZDip
=t+Xo
-----END PGP SIGNATURE-----
From | Date | Subject | |
---|---|---|---|
Next Message | Rob Sargent | 2010-10-30 22:42:48 | Re: large xml database |
Previous Message | Viktor Bojović | 2010-10-30 21:49:29 | large xml database |