| From: | Viktor Bojović <viktor(dot)bojovic(at)gmail(dot)com> |
|---|---|
| To: | Rob Sargent <robjsargent(at)gmail(dot)com> |
| Cc: | James Cloos <cloos(at)jhcloos(dot)com>, pgsql-sql(at)postgresql(dot)org |
| Subject: | Re: large xml database |
| Date: | 2010-10-31 21:16:09 |
| Message-ID: | AANLkTimQC=tg6g3vsYb5X55MUHcaAOmMH_xZskxQNFBz@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-sql |
On Sun, Oct 31, 2010 at 9:42 PM, Rob Sargent <robjsargent(at)gmail(dot)com> wrote:
>
>
>
> Viktor Bojovic' wrote:
>
>>
>>
>> On Sun, Oct 31, 2010 at 2:26 AM, James Cloos <cloos(at)jhcloos(dot)com <mailto:
>> cloos(at)jhcloos(dot)com>> wrote:
>>
>> >>>>> "VB" == Viktor Bojovic' <viktor(dot)bojovic(at)gmail(dot)com
>>
>> <mailto:viktor(dot)bojovic(at)gmail(dot)com>> writes:
>>
>> VB> i have very big XML documment which is larger than 50GB and
>> want to
>> VB> import it into databse, and transform it to relational schema.
>>
>> Were I doing such a conversion, I'd use perl to convert the xml into
>> something which COPY can grok. Any other language, script or compiled,
>> would work just as well. The goal is to avoid having to slurp the
>> whole
>> xml structure into memory.
>>
>> -JimC
>> --
>> James Cloos <cloos(at)jhcloos(dot)com <mailto:cloos(at)jhcloos(dot)com>>
>>
>> OpenPGP: 1024D/ED7DAEA6
>>
>>
>> The insertion into dabase is not very big problem.
>> I insert it as XML docs, or as varchar lines or as XML docs in varchar
>> format. Usually i use transaction and commit after block of 1000 inserts and
>> it goes very fast. so insertion is over after few hours.
>> But the problem occurs when i want to transform it inside database from
>> XML(varchar or XML format) into tables by parsing.
>> That processing takes too much time in database no matter if it is stored
>> as varchar lines, varchar nodes or XML data type.
>>
>> --
>> ---------------------------------------
>> Viktor Bojovic'
>>
>> ---------------------------------------
>> Wherever I go, Murphy goes with me
>>
>
> Are you saying you first load the xml into the database, then parse that
> xml into instance of objects (rows in tables)?
>
>
Yes. That way takes less ram then using twig or simple xml, so I tried using
postgre xml functions or regexes.
--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me
| Attachment | Content-Type | Size |
|---|---|---|
| 100001.xml.gz | application/x-gzip | 2.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Rob Sargent | 2010-10-31 21:26:01 | Re: large xml database |
| Previous Message | Rob Sargent | 2010-10-31 20:42:24 | Re: large xml database |