From: | Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> |
---|---|
To: | Greg Smith <greg(at)2ndquadrant(dot)com> |
Cc: | Joachim Worringen <joachim(dot)worringen(at)iathh(dot)de>, pgsql-general(at)postgresql(dot)org, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> |
Subject: | Re: INSERTing lots of data |
Date: | 2010-06-01 10:14:01 |
Message-ID: | 8739x7m55i.fsf@hi-media-techno.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Greg Smith <greg(at)2ndquadrant(dot)com> writes:
> Joachim Worringen wrote:
>> my Python application (http://perfbase.tigris.org) repeatedly needs to
>> insert lots of data into an exsting, non-empty, potentially large
>> table. Currently, the bottleneck is with the Python application, so I
>> intend to multi-thread it. Each thread should work on a part of the input
>> file.
>
> You are wandering down a path followed by pgloader at one point:
> http://pgloader.projects.postgresql.org/#toc6 and one that I fought with
> briefly as well. Simple multi-threading can be of minimal help in scaling
> up insert performance here, due to the Python issues involved with the GIL.
> Maybe we get Dimitri to chime in here, he did more of this than I did.
In my case pgloader is using COPY and not INSERT. Which would mean than
while one python thread is blocked on network IO the others have a
chance of using the CPU. That should be a case where GIL is working
ok. My tests show that it's not.
> Two thoughts. First, build a test performance case assuming it will fail to
> scale upwards, looking for problems. If you get lucky, great, but don't
> assume this will work--it's proven more difficult than is obvious in the
> past for others.
>
> Second, if you do end up being throttled by the GIL, you can probably build
> a solution for Python 2.6/3.0 using the multiprocessing module for your use
> case: http://docs.python.org/library/multiprocessing.html
My plan was to go with http://docs.python.org/library/subprocess.html
but it seems multiprocessing is easier to use when you want to port
existing threaded code.
Thanks Greg!
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
From | Date | Subject | |
---|---|---|---|
Next Message | Gareth.Williams | 2010-06-01 10:29:36 | Re: create index concurrently - duplicate index to reduce time without an index |
Previous Message | Pavel Stehule | 2010-06-01 08:30:22 | Re: plpythonu / using pg as an application server |