Re: any solution for doing a data file import spawning it on multiple processes

From: "hb(at)101-factory(dot)eu" <hb(at)101-factory(dot)eu>
To: Edson Richter <edsonrichter(at)hotmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: any solution for doing a data file import spawning it on multiple processes
Date: 2012-06-16 15:59:55
Message-ID: 8F3382BB-F0C5-48C0-8791-448EB3349AEA@101-factory.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

thanks i thought about splitting the file, but that did no work out well.

so we receive 2 files evry 30 seconds and need to import this as fast as possible.

we do not run java curently but maybe it's an option.
are you willing to share your code?

also i was thinking using perl for it

henk

On 16 jun. 2012, at 17:37, Edson Richter <edsonrichter(at)hotmail(dot)com> wrote:

> Em 16/06/2012 12:04, hb(at)101-factory(dot)eu escreveu:
>> hi there,
>>
>> I am trying to import large data files into pg.
>> for now i used the. xarg linux command to spawn the file line for line and set and use the maximum available connections.
>>
>> we use pg pool as connection pool to the database, and so try to maximize the concurrent data import of the file.
>>
>> problem for now that it seems to work well but we miss a line once in a while, and that is not acceptable. also it creates zombies ;(.
>>
>> does anybody have any other tricks that will do the job?
>>
>> thanks,
>>
>> Henk
>
> I've used custom Java application using connection pooling (limited to 1000 connections, mean 1000 concurrent file imports).
>
> I'm able to import more than 64000 XML files (about 13Kb each) in 5 minutes, without memory leaks neither zombies, and (of course) no missing records.
>
> Besides I each thread import separate file, I have another situation where I have separated threads importing different lines of same file. No problems at all. Do not forget to check your OS "file open" limits (it was a big issue in the past for me due Lucene indexes generated during import).
>
> Server: 8 core Xeon, 16Gig, SAS 15000 rpm disks, PgSQL 9.1.3, Linux Centos 5, Sun Java 1.6.27.
>
> Regards,
>
> Edson Richter
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bosco Rama 2012-06-16 16:15:44 Re: any solution for doing a data file import spawning it on multiple processes
Previous Message Edson Richter 2012-06-16 15:37:36 Re: any solution for doing a data file import spawning it on multiple processes