Re: Best COPY Performance

From: "Spiegelberg, Greg" <gspiegelberg(at)cranel(dot)com>
To: "Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Worky Workerson" <worky(dot)workerson(at)gmail(dot)com>, "Merlin Moncure" <mmoncure(at)gmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Best COPY Performance
Date: 2006-10-30 14:09:32
Message-ID: 82E74D266CB9B44390D3CCE44A781ED90177807C@POSTOFFICE.cranel.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org
> [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of
> Luke Lonergan
> Sent: Saturday, October 28, 2006 12:07 AM
> To: Worky Workerson; Merlin Moncure
> Cc: pgsql-performance(at)postgresql(dot)org
> Subject: Re: [PERFORM] Best COPY Performance
>
> Worky,
>
> On 10/27/06 8:47 PM, "Worky Workerson"
> <worky(dot)workerson(at)gmail(dot)com> wrote:
>
> > Are you saying that I should be able to issue multiple COPY
> commands
> > because my I/O wait is low? I was under the impression
> that I am I/O
> > bound, so multiple simeoultaneous loads would have a detrimental
> > effect ...
>
> ...
> I agree with Merlin that you can speed things up by breaking
> the file up.
> Alternately you can use the OSS Bizgres java loader, which
> lets you specify the number of I/O threads with the "-n"
> option on a single file.

As a result of this thread, and b/c I've tried this in the past but
never had much success at speeding the process up, I attempted just that
here except via 2 psql CLI's with access to the local file. 1.1M rows
of data varying in width from 40 to 200 characters COPY'd to a table
with only one text column, no keys, indexes, &c took about 15 seconds to
load. ~73K rows/second.

I broke that file into 2 files each of 550K rows and performed 2
simultaneous COPY's after dropping the table, recreating, issuing a sync
on the system to be sure, &c and nearly every time both COPY's finish in
12 seconds. About a 20% gain to ~91K rows/second.

Admittedly, this was a pretty rough test but a 20% savings, if it can be
put into production, is worth exploring for us.

B/c I'll be asked, I did this on an idle, dual 3.06GHz Xeon with 6GB of
memory, U320 SCSI internal drives and PostgreSQL 8.1.4.

Greg

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Luke Lonergan 2006-10-30 14:23:07 Re: Best COPY Performance
Previous Message Steinar H. Gunderson 2006-10-30 12:27:33 Re: Strange plan in pg 8.1.0