Re: large xml database

From: Viktor Bojović <viktor(dot)bojovic(at)gmail(dot)com>
To: Rob Sargent <robjsargent(at)gmail(dot)com>
Cc: James Cloos <cloos(at)jhcloos(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: large xml database
Date: 2010-10-31 21:16:09
Message-ID: AANLkTimQC=tg6g3vsYb5X55MUHcaAOmMH_xZskxQNFBz@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Sun, Oct 31, 2010 at 9:42 PM, Rob Sargent <robjsargent(at)gmail(dot)com> wrote:

>
>
>
> Viktor Bojovic' wrote:
>
>>
>>
>> On Sun, Oct 31, 2010 at 2:26 AM, James Cloos <cloos(at)jhcloos(dot)com <mailto:
>> cloos(at)jhcloos(dot)com>> wrote:
>>
>> >>>>> "VB" == Viktor Bojovic' <viktor(dot)bojovic(at)gmail(dot)com
>>
>> <mailto:viktor(dot)bojovic(at)gmail(dot)com>> writes:
>>
>> VB> i have very big XML documment which is larger than 50GB and
>> want to
>> VB> import it into databse, and transform it to relational schema.
>>
>> Were I doing such a conversion, I'd use perl to convert the xml into
>> something which COPY can grok. Any other language, script or compiled,
>> would work just as well. The goal is to avoid having to slurp the
>> whole
>> xml structure into memory.
>>
>> -JimC
>> --
>> James Cloos <cloos(at)jhcloos(dot)com <mailto:cloos(at)jhcloos(dot)com>>
>>
>> OpenPGP: 1024D/ED7DAEA6
>>
>>
>> The insertion into dabase is not very big problem.
>> I insert it as XML docs, or as varchar lines or as XML docs in varchar
>> format. Usually i use transaction and commit after block of 1000 inserts and
>> it goes very fast. so insertion is over after few hours.
>> But the problem occurs when i want to transform it inside database from
>> XML(varchar or XML format) into tables by parsing.
>> That processing takes too much time in database no matter if it is stored
>> as varchar lines, varchar nodes or XML data type.
>>
>> --
>> ---------------------------------------
>> Viktor Bojovic'
>>
>> ---------------------------------------
>> Wherever I go, Murphy goes with me
>>
>
> Are you saying you first load the xml into the database, then parse that
> xml into instance of objects (rows in tables)?
>
>
Yes. That way takes less ram then using twig or simple xml, so I tried using
postgre xml functions or regexes.

--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me

Attachment Content-Type Size
100001.xml.gz application/x-gzip 2.8 KB

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Rob Sargent 2010-10-31 21:26:01 Re: large xml database
Previous Message Rob Sargent 2010-10-31 20:42:24 Re: large xml database