From: | "Phillip Sitbon" <phillip(at)sitbon(dot)net> |
---|---|
To: | pgsql-novice(at)postgresql(dot)org |
Subject: | Re: Concurrent COPY commands |
Date: | 2008-07-09 16:35:15 |
Message-ID: | 536685ea0807090935i2e191146y7f3acf5bba4ca0ea@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-novice |
Sorry about the late reply.
I only have two fast SATA drives on software RAID, but that really isn't the
issue- while the copy commands are going, disk activity is relatively low.
By relatively I mean that I have seen it a lot higher under certain
circumstances, and I know for sure the disks aren't holding anything back. I
know it's a bad comparison, but the process generating this huge amount of
data can write directly to the disk very fast and still be CPU-bound, while
it eventually ends up waiting for postgres when I try to pipe it into the
database. I figured some overhead was to be expected and that's why I tried
the parallel setup in the first place.
What I see is that after some buffering (not sure it is buffering, but after
it gets some data), one postgres process will ramp up to 100% CPU (on one
core) for some time, thus blocking its input FIFO. That is when the hard
drive activity goes up a bit, but whatever it is doing is definitely
CPU-bound on that core.
No more than one worker process does this at a time. And no matter what kind
of FIFO buffers and select() calls I use, the calling process eventually
gets blocked because the postgres processes don't appear to be working in
parallel as well as they could be; hence, postgres doesn't take in any more
data for a while. I'm really curious about why going parallel x6 is so much
slower than one process when the disks aren't being pushed that hard
compared to their capabilities.
I am suspecting something wrong with my config, but I can't be sure. Is 1-2
GB for work_mem ok? Would that hurt it?
On a positive note, I let the single-process version run to completion and I
now have a solid TB of data that I can access and use at lightning speed :)
Cheers,
Phillip
On Wed, Jul 2, 2008 at 10:02 AM, Alan Hodgson <ahodgson(at)simkin(dot)ca> wrote:
> On Wednesday 02 July 2008, Phillip Sitbon <phillip(at)sitbon(dot)net> wrote:
> > Hello,
> >
> > I am running some queries that use multiple connections to issue COPY
> > commands which bring data into the same table via different files (FIFOs
> > to be precise). This is being done on a SMP machine and I am noticing
> > that none of the postgres worker processes operate in parallel, even
> > though there is data available to all of them. The performance is nearly
> > exactly the same as it is for issuing a single COPY command.
> > Is this
> > normal behavior, even with all of the separate transactions still in
> > progress? Would I be better off doing multithreaded bulk inserts from my
> > C program rather than sending the data to FIFOs?
>
> Sounds like you're I/O bound - I doubt any other concurrency mechanism will
> change that much.
>
> >
> > The machine I am using has 16GB of memory and 8 cores, so I've tried to
> > optimize the configuration accordingly but I am a little lost in some
> > places.
>
> Ah, but what does your RAID controller and drives look like?
>
>
> --
> Alan
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice
>
From | Date | Subject | |
---|---|---|---|
Next Message | Alan Hodgson | 2008-07-09 16:45:37 | Re: Concurrent COPY commands |
Previous Message | Emil Obermayr | 2008-07-09 07:12:50 | Re: how to get dependancies of a table? |